 I think that we can start now. Thank you for everyone for being here today to listen to our presentation. And we start. I'm very glad to hear you to tell you about our story and how we build the infrastructure for the cloud public and private and collaborative on it. So let me introduce myself. I'm Duc from VietTel in Vietnam. Here in my colleagues. And my name is Vinh. And you guys can call me Tau Vinh. We are both, we are both an organizer of Vietnam Open Infra community, a member of Open Infra community. So here's our story. First of all, Vietnam is a Vietnam. We are from Vietnam and VietTel is the largest telco company in Vietnam. We have 170 subscribers over 10 countries, including Asia, America and Africa. And for now we have near 8,000 right several thousand square meter floor. For now in Vietnam we have 8 center until 2025. We have 13. For the cloud infrastructure now we have 150,000 physical core over the 30 base robot of storage including SAP and sound storage. Here's the list of cloud service that VietTel provide including the service from infrastructure layer co-location service to the platform application and consulting managing service. In that to the service we are mostly developed by our cell and beside we are provides marketplace for other company or other partner we provide the solution in our infrastructure. So here's the main part of this presentation. In 2013, we faced a lot of difficulty to manage our infrastructure. The first of all is we have a lot of vendors that using physical and also virtual machine based on VMware or some kind of hypervisor and also the physical. Besides that we use sound storage from 7 different vendors. So fragmentation cause very difficulty in deployment as well as operation and explosion. When we are updating the whole infrastructure for 10 countries in only one network operating center in Vietnam. Besides that, manual deployment process takes a lot of time and at that time we have facing that we must integrate with a lot of vendor from other country. So operation problem monitoring warning alert with many difficult when bring some new hardware into the network interaction between vendor. So difficult. At that time we have to decide what model we should try between IT infrastructure and telco infrastructure. So the decision and the question is what is the architecture that we choose and what is the core technology we choose and how we build the human resources. Here is the infrastructure. We decided that we separate IT and telco cloud into small deployment. We have several private cloud cluster mostly based on open stack and steps. We have only one data center inventory management, only one organization system and only one global network operation center. And for IT cloud only open stack. If not open stack we use physical server to deploy service. Regarding the infrastructure using open technology as a foundation open stack we use to general computing needed. We provide virtual machine and other that some kind of APC using ZPU service. For the storage we are using SAP for software defined storage and several kinds of sound storage. This depends on the deployment and depends on the requirement of the application. But mostly we use SAP for the whole infrastructure. For the network we use the topology of open stack. I will show you later in the next slide. After supporting technology based on open source product like network permeators, some kind of product that you see around here. Here is a picture of 2023. In 2018 we only focus on infrastructure open stack cells, KVM open B switch and Linus. To do that we have follow the open technology, open infrastructure, open ecosystem. And we have training our colleagues to contribute to the open stack and open community, open permeators, Kubea spray, cluster API. And also besides that we you can see in the corner that some projects that we build by ourselves and public it. It's on our GitHub. Here is VCloud, don't forget to be awesome. Some kind of project that we facing with so many difficulties in operating the whole infrastructure because we have a lot of sand storage from five vendors. So we developed the eSpotter for ProMeteor to monitor all the sand. And some kind of projects that open stack don't support. Fully we decided to build ourselves and public it. Here is the timeline. Yes, 2018 first cloud, first team first team build based on open stack cells and ProMeteor. The next year the first cluster was run 30% of legacy IT infrastructure was live and seep into cloud. And we build a team to migrate to microservice architecture. After two years another cluster will be built in Hanoi. And 70% legacy IT moved to the new infrastructure. And we also provide solutions to three other countries. In 2021, the first cloud cluster in Ho Chi Minh is the South Vietnam. And we have 1885 legacy IT infrastructure migrated 1885 because of some kind of database that works in cloud. So we still keep 20% still run on the Benito or vehicle server, such as Oracle database, something like that. And it's time to move to other countries. For now we have eight cloud open stack and 90% legacy infrastructure migrated and we have become a public cloud provider in Vietnam. South Vietnam and cell service for the government, banking and financial service when you first drink. So this is the most thing I will tell you about our lesson. Here is a big picture of architecture. In Vietnam we separate the country into three parts. The north, the south and the center. In each region we have several data centers. So in each data center we separate into the az and all are connected with the base bond. At least that 2,000 gigabits, sorry, to 200 gigabits per second to the base bond. In HDZ we use LISPY network topology to envisage land to each land between different locations. Make sure that network are connected around Vietnam. For the open stack deployment anyone here know about the open stack or some kind of yes. Open stack deploy is quite difficult and when we extend the cluster so many problems will occur. So the other thing is until now we do not deploy too many compute nodes in open stack cluster. For my scale each collector we deploy only 500 to 600 compute nodes and each whole maybe 10,000 VM for each batch if you have more compute nodes the whole cluster will slow down and so many problems will occur. You need to pay attention to the response time of the open stack API and DB query when your collector grow up slow API will affect the experience you can imagine when you click on the horizon on the common line it takes a lot of time to respond to you and sometimes it is time out when you using it. One thing is ghost but considering about magic do not monitor too much magic because it can increase the loss of the whole system not only the monitoring system, the whole system the core cluster may be heavy loss because of monitoring. For the deployment we face that each time we build a cluster we buy several models of CPU so that its VM on that cluster cannot migrate so you need to identify CPU model in the novel conflict for the one AZ if you have 10 or 20 CPU nodes you unify by one CPU model for the VM inside it and if you have some kind of CPU node that different model you separate into next into another cluster and you have two clusters of CPU nodes or by storage if you using steps you can use the same storage but if you using shan you have to separate for the shanswik that the compute node connected and for the line migration of the virtual machine you have conflict to the CPU during the live migration to make sure that the RAM memory is not changed too much so that the virtual machine can migrate from one compute to another compute smoothly so pay attention to the default conflicts of open stack some kind of conflicts that period running can make heavy loads of the database query when the compute nodes grow up so one query can make high loads on the cluster we have an experience that collect all the information of the VM and run one minute per time so it cause up so many query into the database and the whole collector very slow take care of hot app policy edit policy, mark connection in the backend sometimes it is full and it will connect from the client for the steps you have to pay attention to failure domain from design it may be a host level or right level may be top right or live it depends on the important data data if you separate into server and hold the right going down you still load the data so depend heavily on network infrastructure so make sure that you have a stable network connection and the latency is slow very low low latency so therefore the VM will get the high load latency for the VM you have to take care of port security and open stack just port security only allows the packet with IP and mark address node to open stack by default so therefore some kind of application that change the IP and the mark of the packet and if you enable the port security you cannot use this function so please disable or outside the address pair for the port entropy for the VM is very important some kind of application that running on Java or Oracle need entropy to start and if you don't have entropy application maybe some kind cannot start or start slowly and for the network performance of the virtual machine please enable the multi queue for the VM so here is the running on the infrastructure so and the next part is for the Kubernetes service we choose to go with cluster API to provide our dedicated cluster and manage Kubernetes for now and you know that cluster API can provision dedicated cluster very good however to bring up managed Kubernetes cluster and service community cluster we have to manage our thing here and currently we serve dozens of currently Kubernetes cluster over 500 nodes and over 5,000 CPU and over 6 terabytes of memory actually that's not too big for now however these Kubernetes things will grow soon and we learn a lot from this cluster and we can sum up a few things to share for you guys here as you know we use cluster API for the core of our community service so it can provision good for a dedicated cluster or a traditional cluster using a ADM. The fact that we build our service on top of infrastructure that power by OpenStack and therefore some customization should be make like cluster API provider and cloud controller for OpenStack that's in our infrastructure and we do provide Kubernetes for our customer authentication and authorization especially the connection from the client cluster to our infrastructure should be pay attention so monitoring and letting and logging even our this log for control plane is very important monitoring to make sure we meet the best way that we announced to our customer and letting to know when our operator should come in to resolve and save the service and logging and audit log is good for security reason and knowing what to do when the client's cluster faced an issue so another pain point that we want to share with you guys here, maybe you guys already know that ETCD is really sensitive with the duration or latency this latency that that's why we recommend that the latency or duration should be less than 10 milliseconds and in the history we hit by a latency very, very hard and our cluster is now stable at all so and over the last 4 years we come to a conclusion that Kubernetes clusters should have a checklist for you guys before deploying and when you are deploying and after deploying and we have around over the item in that checklist and some major criteria related to that checklist can be shown here is related to the functionality of the Kubernetes cluster deployment architecture, security and high availability and and backup and restore monitoring and observe availability and finally your deployment should be fully documented for other people that joined to your team can maintain that's the Kubernetes cluster and the next thing we want to share here is related to our database service and we provide some database like MySQL, MariaDB and PostgreSQL so there are a lot of things to do here we build our database services based on OpenStackTrop project and to make Trop fully function we have to do we have to build our task queue and task capture for background task processing and creating a solution for DB backup and remember to wrap all the metric related to the control plane and the management of your cluster so we have something to share here related to OpenStackTrop as a foundation for database service and OpenStackTrop is great and it's really better if we make sure we have to check solution for fail although when our database in trouble and pay attention to automatic backup for DB fully backup and increment backup and one of the most important things here is bring SSL support to our database and customer can use it in production environment and we have the same story with Kubernetes deployment you should have a checklist for DB deployment and in detail we have a list that contained around over 40 items related to something like version to make sure that the version you currently use is still supported and deployment architecture and pay attention to the right sizing of your database and backup and restore and tuning configuration don't overuse default configuration they will hit you really hard and monitoring and observation and security structure is removing redundant user and enable SSL and TLS support and last but not least your deployment should be fully documented for other guys can join in your project and in conclusion thanks to a great open source community in Vietnam and all over the world without you guys we cannot grow to this side and standing here to talk about this in this picture we want to share one of them the idea in our Vietnam community the first one is related to the community as you know that we use open source we love open source and we contribute back to the open source community too it's a lot from our community thanks to you guys here to bring there a lot of knowledge and information that we can build our infrastructure and show out to some other guys in open source community over there thank you the next thing we want to share is that you should make really good relationship with other companies because other companies not just our rival they can have a lot of great knowledge and insight that you can learn and the last thing we want to share here is related to the college or university we had a strong relationship with our other college in Vietnam and university in Vietnam for mentoring students not just for us but for the community too and other companies can have a really great human resource in cloud computing so to be finished with the presentation some takeaway here to take a picture thank you the first one is related to infrastructure and the second is related to storage deployment and virtual machine and then we have some sharing in community service and DB service with OpenStackTrope and that's all for our presentation today thank you very much so we do have some extra minutes here any people have the question for us you can use that mic over there so thank you for the presentation I think it was very helpful you guys have the emphasis of the cloud solution in Vietnam around OpenStack if you were to take this to the next level do you think you would remove OpenStack and have more focus on Kubernetes running on bare metal directly and eliminate the entire OpenStack layer because personally I think more and more we're moving away from OpenStack from what I've seen in the industry and I just wanted to know what are your thoughts about the future generation of the cloud evolution your company has for the country thank you I have two ideas the first thing we need to consider is what's the workload we run on the Kubernetes and that workload can or cannot run on the virtual machine like OpenStack or VMware whatever, hypervisor you know if the workload can run on the virtual machine let it run on virtual machine because we have a layer that operating and some kind of managing the whole infrastructure and it is easier for you to deploy on virtual machine for the VM provisioning, deploying and terminating than the bare metal besides the second idea is OpenStack also have a project that's called RNA that allows you to deploy the bare metal so one OpenStack for minutes all the VM and all the bare metal service so depending on the workload you can deploy it on the virtual machine or on the bare metal is it okay? actually we are trying to make it for now thank you a couple questions, one quick one what were you guys using for the underlying networking underneath Neutron was it like OVN or was it OVS or was it an NL2 driver for your network switches for now we are using OpenV switch and ML2 plugin for the whole infrastructure but we are planning to have some kind of lab and marketing to the OVN but not yet maybe next year when we the first collector go bigger we can separate into next cluster and we maybe deploy as an OVN the second question may be a little longer but you guys actually I saw you did a customized CSI for your storage driver was there a particular reason why you went customized versus using the sender CSI? yeah related to this because our infrastructure have some customization before that's why we have to customize the CRI driver for that so we hope that in the future when we use standards OpenStack and we can standard and default the CRI driver for it yeah thank you I just want to say it's really impressive what you built in five years it's really really cool and maybe this question is a talk for next cubecom but I'm just wondering what kind of methodologies how did you work in five years to create all this did you use GitOps or how did you implement DevOps or something overwhelming? related to this thank you we are running some kind of DevOps control in our company not actually GitOps here just DevOps and SRE yeah thank you hello thank you this is very helpful just a couple of questions one regarding creating the Kubernetes cluster on top of OpenStack so when you create that cluster so you create it's CD on the OpenStack itself for each of those Kubernetes clusters can you repeat the question? so when you create the Kubernetes cluster on top of OpenStack where do you run the HCD cluster like do you kind of create a highly available HCD for each of those Kubernetes cluster? we have several options here the first one is running HCD cluster separate with our control plane cluster and the other option is running HCD cluster with the control plane and in the near future we will separate all the HCD cluster to the other cluster not related to the control plane at all we have to avoid some kind of issue related to this other thank you so the next question is have you guys thought about trying HCD cluster like having K3S running inside Kubernetes for each of those clients? actually we will not try this for now thank you thank you very much for joining this session oh so you can ask here because the time is running out thank you very much for joining this session