 Okay, good morning Well, first of all I'd like to applaud you for being able to be here so early after a party like last night That shows a lot of determination. My name is Annie Lai. I'm with Huawei IT product line and today with my two colleagues Zhang Yu and Aiyou we're gonna talk to you about Huawei public cloud and And so As you know that Huawei we have been serving our carrier customers for the last 20 plus years and for the last 12 plus years would be helping carriers build data centers all the way from L1 infrastructure to application services and So we build actually over 400 data centers all over the world including the world's largest data center China mobile Which is 600,000 square meters and I don't know why this thing keeps going back here and And then also for the This is interesting. Why is this going back? So for the last This is like confusing me. Okay, so we've been moving building data centers all this time and then also for the last five years we've been helping our carrier customers build private cloud public cloud and Yesterday, I had a session about three telco story to Open stack one of the use case that I talked about was carrier enable public cloud And so carriers are actually very interested in becoming a cloud service provider for their region I know some of you might think that our carrier probably don't have chance in US right because US market is very tough You got the AWS you have Amazon you have Google And Azure and but outside of US they because of the data sovereignty problems And also a lot of carriers are backed by the government and they have access to their customers So they actually they have a quite a bit of a chance of being their de facto public cloud service provider So we've been helping them build public clouds and our flagship public cloud was Singapore Starhub serving Singapore government as well as Singapore citizens and we started that project over four years ago And because that was our first time we did a public cloud deployment So we partnered with Singapore Starhub and so we both were learning at the same time our with a group of team people Help Singapore Starhub operate the public cloud and we learned a lot and after that we had a lot of RPs RFIs Requesting Huawei help them build public clouds and however for us to in order to become expert and in US There's this phrase called you have to eat your own dog food or you have to take your own medicine Right in order to be an expert. We have to be able to operate a public car ourselves So that's why we decided to start a Huawei public cloud But focusing China only because we want to you know really become the expert in operating a public cloud So we use China as our marketplace because we know the market very well and outside of China We definitely don't want to compete against our carrier customers. So we partner with carrier customers That's our strategy and for Huawei public cloud. We call it Huawei cloud services HWS and we just launched that this year this summer and The positioning is is enterprise focused enterprise grade public cloud focusing on infrastructure As a service plus type of services targeting enterprise customers government agencies as well as startups SMEs and This whole public cloud is running on Huawei Platform fusion sphere and this year in the middle of this year We actually fusion sphere past the open-stack interoperability testing. So actually we have acquired open-stack Empowered local license and that means our fusion sphere is open-stack pure and Open fusion sphere is a crown jewel. It's a core investment for Huawei And because we know that you know for our customers to deploy private cloud public cloud They need a large-scaled, you know highly scalable interoperable Platform so we double down on open-stack. We have globally we have over five R&D centers We have two in China Chengdu Xi'an and one in Israel and two one in Silicon Valley U.S. And one in Canada all together. We have over 2,000 engineers working with the open-stack community and So, you know, even though it's we are only serving the China market, but it's a large market Who knows how many people? What's the population size of China? 1.4 billion population and just the fact that China actually has 10% of the global developer community Which amounts to 1.9 million developers and China's got the second highest Open-stack developers and Beijing itself is the highest open-stack developer city So we have a lot of developers who can really benefit from this Huawei Public cloud and this is such a huge-sized market. There's no way we can serve the market with only one data centers so we actually have Designated 17 data centers all over China running this one Crossing seven regions running this Huawei public cloud and also from a networking standpoint. We picked the three-tier networking topology and So for more condensed concentrated cities such as Beijing Shanghai, etc We call them the first tier, you know network You know region and then we have second tier network region These are medium-sized cities and the third tiers are somewhere further and we use those regions for you know disaster recovery back up that kind of stuff where they do not require a lot of network speed and All the data centers are at least t3 or t4 So in the data center, they have a class t1 type of data center t2 t3 t4 and that definition is based on the Availability that offers so for t3 t4 type of data centers They offer at least three to four nines availabilities and If you're interested you can go to www.hwclouds.com unfortunately in Chinese, but I think for Japanese audience I think you might be able to read it. It's in kanji, right? So you go there You can see you can choose services product and also We included some of the customer use cases that you can check out and this is the portfolio and Like I said earlier, we focus on infrastructure as a service plus in other words, you know The services that we're offering is base is about like it's for you know compute storage network And then also we have an environment past like environment to enable developers and ISVs And then in addition we included security and management and over time we will keep evolving will keep adding new services and For the customers we have been very lucky. We've gotten pretty good traction We have gotten all these customers these are just some examples and we have additional customers that are not listed and they're more representative and You know the the verticals that we consider low hanging fruits are large enterprises public sectors financial services media entertainment and large and then small internet startup companies and So with that I'm going to invite our architect John Who I was deeply involved in the design and the deployment of Huawei public cloud? And he's gonna share with you his experience with designing and deploying Huawei public cloud Here I will show you some technical points about how they public cloud. Okay, sorry, so which part into the Just use the keyboard so In general, we start our work from the upstream open source open stack Okay, so first I want to say thank you to the community to provide such a great open source project Because of we have the open stack we can start our work on a very high level start point Okay, but today what I want to share is What's more we need based on the open source open stack itself? Okay, so let's Look at this list. This is what we have from the community. Okay, so we have many many good code Okay from many many star smart guys and then Open stack upstream over stack provide many modules for building up a powerful system Okay, so we have rich functions and we have open APIs. So that's great but This is not all of the things we need to build a public cloud. We need something else So here we provide a brief list for the components Including both hardware and software which we use we need to build up and to end the public cloud solution. Okay, so Just from the technical point of view. Okay here So we will not talking about the marketing team and operating team business team things No, we just focus on the technical itself. Okay, so At least we need a data center infrastructure, right? So we need our physical devices And then we have virtualization and the software defined for the software design soft storage Software defined networking and so on. Okay, so Beyond the service part So we need a business supporting system to manage the service catalog to show how many services we can provide Okay, we need account management. We need other management to bailing and so on and for the operating Support system part. So we need our deploy system We need our upgrading monitoring logging alarming notification and so on so many things for operation, okay, so we need portal right because although we can provide APIs out to the High level advanced user who can write some program. We also need to provide a portal for the entry level users Okay, so and beyond that We need some higher level Services components, for example, we need the software to build up past level services or even such level services and so on Okay, so beyond the components We still have more requirements for the system. Okay, so Since it is a public cloud service So the reliability availability performance scalability easy to maintain security low cost user friendly So we can have a long list of such top priority Requirements in fact, so it's very hard for us to to judge whether this is more important than the others Okay, so all of the points shown in here on this page are really important We should think about that So I think Frankly, we have a gap Between the open source upstream source code and end-to-end public service Okay, we need a we need to overcome this gap So this is just what we do when designing Deploying and operating the Huawei public cloud. We are just trying our best to overcome this gap So this is a high-level architecture of Huawei public cloud. Okay, so we can see that Open stack is in fact the kernel part of the cloud platform Okay, we use open stack to manage the virtualization and software defined Components and then under that we have our hardware our depth center infrastructure. That's okay And beyond open stack and beyond the cloud platform Module we still have cloud service and the cloud BSS system. Okay, so here called Cloud service is used to wrap up some low-level API some low-level item Functions provided by open stack to some higher level service Combined API operations exposed to the end user. Okay, so this is why we introduce the cloud service Okay, so for cloud BSS, this is just for the Business things for example the order management to the service catalog as a billing and so on okay, so On top of all of the modules. We have console. Okay, we have two consoles in fact one is for the end Unpa user so just like what any has shown in the snapshot. Okay, and also we have to provide a Administrator oriented console. Okay, so this is for the Administrate of the system. Okay, so for all of the system, we will have an end to end cloud OSS system. Okay, so this is for the Operation and the maintenance team Okay, so We will talk about some challenges When building up such a system. Okay, so we have many challenges But I I will not cover all of them because we have limited time. Okay, so here I will just Talk about some of them, which I think is very interesting. Okay, so the first is reliability Okay, because this is a service provided to public. So we expect that the service itself can be used Every day every hour or even every minute every second. Oh, it must be reliable. Okay, and then Scalability. Okay, so this is a public cloud We can start it from a not very large scale, but it must be scalable. Okay, because our business is increasing and The third is easy to maintain. Okay So because for public cloud the the price the price of your native service is Very critical because there are so many service providers in this domain Okay, so we must be cheap the service should be good, but the price should be cheap Okay, so how to make the price to cheap we should cut down all packs at first. Okay, so Easy to maintain is very very important to the cost cut down So for the reliability part We think about many things, but the most critical thing here is that we should use a component to We should use a component which we are most familiar with to combine the system Okay, for example for hypervisor for storage for networking for each component of the system You should choose the most familiar component you have. Okay, for example for For example for virtualization We just use a product of the fusion compute enterprise-level virtualization management suits Provided by Huawei itself. Okay, because this is our product We know it very well if things go wrong we can fix a bug we can echo the problem at the first time Okay, so this is the reason we use it So similar case in the storage part. Okay, we do not use third-party open source software defined Storage software we just use fusion storage, which is also a product provided by Huawei. Okay, we know it This is the most critical reason we use it. We know it Okay, so I'm not here to make some advisement for our products This is just some suggestion. If one day you want to build up a system for your own You should keep in your mind that I think to use something you are familiar with is really valuable Okay for the scalability thing. Okay In fact, I think most people here know some truth that the scalability of a single open stack deployment is limited Okay, so we cannot increase a single open stack deployment to a Very very large skill. It's very hard But the scalability itself is essential then how to address this problem We just use the open stack cascading technology created by Huawei itself And also we have contributed this technique to the community So we have a open source project named as tree circle. Okay, so tree circle is Open-stack cascading Solution so which is used to aggregate a number of open stack deployments into one unified logical resource pool to provide a large Logical resource pool to the end user. Okay, so by using such of Solution so we can have some trade-off Okay, so the skill of a single open stack deployment should not be too large There is no necessary reason for that. Okay, so the skill can be just maybe so 500 physical summers For each single open stack deployments. Okay, but we can have many many such kind of deployments And we use open stack cascading to aggregate all of them. Okay Another problem for open stack scalability is Networking okay the networking part if you are familiar with neutral You will know that in the natural born upstream neutral architecture. There is a component means named as network node. Okay, so We use a network node to do our three-level traffic forwarding and we use that to provide the LCP and other Network services. Okay, but such kind of our three-level Node itself is in fact a problem. Okay, especially in large-scale deployments The network node itself will introduce the first the performance bottleneck and second single point of failure Okay, so in Hawaii public cloud solution, we just remove the network node here We use distributed virtual network solution. So for distributed network, we use DVR Okay, so by my understanding we are the At least we are one of the first teams to use DVR in Production-level environment and we believe that it is usable. Okay, maybe we need some some Consolidation means we need to improve it to fix some bug, but it is usable. Okay, so beyond DVR We also use distributed DHCP. Okay, so there is no Consolidated DHCP agent on the network node. No, we just distribute the DHCP mechanism to each of the computer node So that the DHCP service on each node only serve the DHCP request Send out on this or within the scope of this computer node. Okay, so apparently the DHCP agent Distributed DHCP is not a feature in upstream open stack But while it is now working hard to Contributing this feature back to the community and we plan to land the distributed DHCP feature in metaka release Okay, so For the easy to maintain part so we provide many components in our HWS system to make the system more More understandable, okay, it should be understandable because it's so complicated. Okay, we have Thousands of servers. We have thousands of various type of components software hardware. Okay, the system itself must be understandable Okay, so how to be understandable we have to introduce some tools Some tools here So the first is that we use Zeppix to monitor the performance of the system including both the hardware and also the Critical component of open stack itself for example we use Zeppix to monitor the CPU utilization the memory Utilization of each of the open stack components for example No API no a conductor no a scandalous center API and so on. Okay, and Also log log is always a very important topic to discuss Okay, so every day we are talking about the big data. Okay, so what data is big? The log log is a very very large data site Okay, so by using such kind of ERK solution to collect all of the log data into your Data management system you can do many Analysts based on the log data and you can understand the status of the system very well. Okay, so based on the ERK solution we provide a status analysis component which is just used to automatically analyze the Content provided by ERK especially the elastics such Cluster to understand whether or not the open stack cluster is working. Well, okay, so this is just something we do for this system, okay, and Also, we have automatic testing trigger. Okay, so by using this by by involving such kind of a trigger We just periodically Issue some API calls to the system to see whether or not the service is still usable. Okay, for example, we periodically And automatically create VM to see whether the NOAA service is still usable Okay, so this is some trick to understand the functional creativity the functional correctness of the system Okay, so next I will transfer the time to my Yeah, yeah to Aya. Yeah. Thank you so much before I bring up. I just want to make a comment As you know that you know the current open-stat release is really not enough to you know to operate such a large scale public cloud So that's why why we have to adding a lot of extensions and to make it work And it is our goal to eventually upstream all of these features advanced features and functionalities And especially for a tricycle project, you know, it is a large-scale project We'd like to invite you guys to come join us and make tricycle a success And so everybody else can benefit from you know our work and so after after down here, I'd like to invite our CTO from Israel and Ia is gonna come out and talk to you about a lot of exciting projects that he's working on Hopefully with all these projects that he can make a Huawei public cloud a lot more competitive and a lot more You know useful for our customers Okay, thank you all for coming This is still going in the wrong direction So first of all it's very important to note note not many people know this but Huawei is really active in the community And we're not just using open-stack. We are actively trying to contribute to all the things that we're learning in the public cloud Experiences and as well as in the private cloud distribution that we have so up until now we've been showing you that we've contributed Bugs bug fixes we found about fixed them performance improvements improvements to DVR and as Jung you mentioned we're also Contributing back the distributed DHCP based on DVR technology But in fact we're doing a lot more of that so with a quick show of hands How many of you have heard of project career? Yeah, nice keep your hands up. How many have heard of dragon flow of? Tricercle before coming here today, okay? What maybe you do not know is that all of these projects were initiated by Huawei some of them with In collaboration with other companies like midakura some of them alone in addition we're we pushed the service function chaining functionality into neutron In this summit we're launching a new project called smog Which is focused on data protection and you're going to hear soon about Not today, but you can expect to hear more about in a hybrid hybrid cloud project that we're going to push and All of this is of course upstream open source So as you can understand we focus on Upstream we are committed to leading innovation in the community To drive open stack to become an enterprise-grade solution upstream and To make sure that everything you need is under the open stack umbrella So what I'm going to focus in the rest of this session is on Neutron networking so as Jung you mentioned networking is a still Mostly an unsolved problem in open stack. There are lots of Issues there with the reference implementation and what we want to do is make the reference implementation usable and scalable so To do this we've introduced project dragon flow and what is dragon flow dragon flow is an SDN Controller and now you may be thinking and if you're not thinking this you should be thinking it Why the hell do we need yet another SDN controller? There are so many in the open source Well, there are two major differences between it between dragon flow and any other SDN controller that you're familiar with The first of the first one is that is that it is totally developed under open stack And it is the only SDN controller that is being developed under open stack, but that is not a compelling enough reason to develop a new SDN controller the second major difference is that all the other SDN controllers You're familiar with decided to build a distributed database by themselves to make it cluster to to build it So that the clustering the HA mechanism everything is built in to the SDN controller once they do that They're stuck with it now such an effort is a three to five year effort to make it enterprise grade with Dragon flow what we've decided to do is say hell no That's someone else's problem, right? They've already done this. We'll just make the database pluggable, right? And we will reuse other people's work like Cassandra like Ram cloud etc and all we need to focus on is the networking parts When Jung you mentioned before that ease of maintenance is critical to keep the OPEX down This is it and we will I will touch upon this a little more So dragon flow is a distributed SDN controller The SDN controller itself runs on the compute nodes as you can see in a distributed manner And if we take a look a little under the hood, then you see that we have the pluggable database layer We already have support for rethink to be Ram cloud and etc and adding Driver for another pluggable database is a one-day work. It's really simple In addition, you can see that we have the application layer here So today dragon flow supports most of the neutron functionality. We have layer 2 support distributed layer 3 support Distributed DHCP and I will touch upon that a little more later and we have a lot of Plans for more features we initiated dragon flow a little more than a year ago We presented it in in the Paris the Paris summit and today it's already being used in in Production environments and our target is to get into the Huawei public cloud soon So From the feature point of view as I mentioned we support the layer 2 functionality So layer 2 core API IPv4 IPv6 the major tunneling protocols Of course distributed layer 3 virtual router distributed DHCP when I said ease of maintenance right Distributed DHCP is less than 400 lines of code To develop it took us one day to get the first version was 300 lines of code debugging testing Etc it now works 400 lines of code that is ease of maintenance. Okay, whole of dragon flow less than 3000 lines of code All of this functionality in so little code that is less than DVR itself, which is a single feature As I mentioned pluggable distributed databases and selective database distribution. Maybe I will get to that if I have time So on the roadmap We want reactive database distribution what you need to understand is that the database is a critical Factor in any SDN controller. It really is the point which determines your scalability Because of two reasons right the database what it does it it Distributes all the network topology to the compute nodes So the determining on how fast it is to spread all this information to all the compute nodes Determines your scalability in addition the less data that you need to distribute to each compute node the more nodes you can Scale to so if you have a reactive database for distribution That means that the compute nodes only go to the database when they need and ask for the information and not Proactively getting the information all the time from the database like other distributed SDN controllers then we can scale a lot more Container support as I said, we're leading career project. Of course, we're going to support containers in dragon flow service chaining support distributed D NAT and SNAT today this is being worked on already and Offloading to hardware components to get better performance Okay, so as I said, we've took us one one day to get the first version of the distributed DHCP So why do we even need to distribute DHCP? Well, first thing to understand is there is no dynamic Allocation of IPs in open-stacked using DHCP Neutron is the IP address manager Neutron decides the IP address and the MAC address of every virtual machine in your environment So then all that we need to do is push this information into the virtual machines We don't need the DHCP server now. How does DHCP work without dragon flow? Well For every virtual network that you have Neutron sets up DHCP server on the network node this means that if I have a hundred tenants with an average of ten Subnets per tenant. I have a thousand DHCP servers running in my environment if we use the DVR architecture to distribute we actually make the problem worse because we now although we're distributed We have a lot more servers DHCP servers in our environment So we said well, this is a problem that goes straight to maintenance to ease of to ease of use and Increases our optics. How do we solve this? so we have on the right hand side, you can see the General DHCP protocol and on the left hand side, you can see how dragon flow takes care of that So basically what we do is introduce a flow into OVS that hijacks DHCP requests locally Sends them to a DHCP application in dragon flow Which creates a DHCP response returns that to the virtual machine Locally and no network or no traffic over the network, right? This is all local on your computer Because the database distributed the topology to dragon flow Then we already have all the information that we need How am I with time? Okay, I still have a few more minutes. I already covered Plugable databases. I will just mention that building a database is something that takes years right to stabilize to get a clustered system Right takes a long time so As I said, we made it pluggable and what I would like to talk about is selective distribution Really in order to scale and this is a very important point You need to limit the amount of data that you distribute to all the compute nodes If I want to scale to 10,000 physical servers Even if I need to distribute one megabyte to each one of them if I need to continuously do this This takes a long time and it's a big It's big pressure on your network So instead what we want to do is limit the amount of data that we distribute So in this use case you can see that I have two compute nodes running virtual machines whose network topology has no Connection with each other right VM 1 and VM 2 and our are on one network VM 3 and 4 are on a different network This means that compute node one really does not need to know about the network topology on the right side and vice versa Compute node 2 does not need to know about the network topology on the left side So this is selective distribution what we do is we have a published subscribe model in the database And that's why I don't know if you noticed we support rethink to be and we just subscribe on the tenant information that That complies with the virtual machines that we're running and this way We're limiting the amount of data that we need to distribute to each compute node and making the solution more scalable and last point Of course containers so You've heard of career career is a project which is trying to Standardize container networking in a in a hybrid environment. So what does a hybrid environment mean? It means the Yeah, exactly. Okay. It means the ability to run Docker containers inside a virtual machine alongside regular virtual machines that are Serving our workloads and The problem is that setting up the networking on the right side for the containers that are nested is Difficult now they're nested there because of security reasons. No one in the right minds today is Deploying containers on bare metal. They're doing it on pertinent virtual machines. Okay, so We still have the problem. How do we get IP addresses? How do we manage the network topology for those containers inside and and they can reside in different subnets, etc so career set out to Standardize this that there are other projects that are Doing this like flannel weave etc. But we want all of them to comply with the neutron network Modeling because it's already there and it gives us a unified way of managing the network and dragon flow is going to be one of the first Network solutions that supports career in this way and I think that is it Questions. Yeah, so as you can see, you know running such a large-scale public cloud While we are becoming the user open stack So it is helping us push open stack It's helping us to push Huawei to be more innovative and so we need all of these projects in order to make a Such a large-scale public cloud working properly. And so we need your help You know all these projects we can't do alone So we'd like to invite you to come join us and if you are interested in any of these projects come talk to Aya so what now we I think we can take a couple questions and Please be brave ask ask away How many note do you have in your public cloud? I'm sorry. I'm not authorized to talk something about that. Sorry business 17 data centers Physical data centers, but it's a large system. We use Juno Yeah, Juno So in fact, we just deployed Our public cloud based on fusion sphere 5.1 version So this is a product version from probably and it's based on Juno Okay, so we in fact we improve the Upstream code for example with expanse we improve reliability and so on how often do you pull and update from the trunk? That's good question so In general the Huawei public cloud is maintaining a CSID mode Yeah, so where you sync with upstream frequently Because of public cloud Any how frequent is that is like Can follow a Expand the cloud services in United States or to them to the United States The the reason why I'm asking is the US government Prohibited far away to sell routers and many other network hardware in United States And I wonder you can provide services in United States Well, like I mentioned earlier the reason we're doing a public cloud In China is because we are actually helping out our carrier customers to enable their public clouds So it is not our ambition to become a global public service provider So that's why we are only selling in China market so we can get the experience We can become a true expert and help out our carrier customers However, if you are interested we have partners all over the world and carrier partners all over the world And we can definitely you know have our carrier customers service you give this a knee in that market No more question. Well, thank you so much for your time