 Good to go. Okay Welcome to the talk scalable clouds for application delivery on OpenStack My name's Marcel Harry. I'm working at Swisscom. I'm leading the architecture for the elastic cloud which we base on top of OpenStack and The application cloud which we deploy on top of it which we base on Cloud Foundry. This talk will mainly give you insights of how we build things, what were our lessons with it. I have a background in system engineering and software development, so it's kind of like trying to bridge the two worlds. I'm not the only one, so I'm part of an awesome team and we're building like at the moment let's call it the next generation infrastructure for cloud native and applications and workload. A few of us are also here. We can speak together afterwards. First briefly something about Swisscom. Swisscom is within telecommunication, within Switzerland. It does also a lot about so but Swisscom has also bigger parts about IT services and even more services all serving the Swiss market. It has quite a huge fixed network coverage. Probably providing for most people their broadband at home and the same goes for mobile network. Swisscom has about 20,000 employees. So the market we deliver with mobile communication is a fiber optic broadband, traditional DSL, mobile broadband and also TV coverage. So Swisscom, for example, has also TV offering for residential customers. Swisscom started unified approach to cloud infrastructure and services. The idea out there is a little bit to build 360 degree cloud. The idea is that we build a cloud that can be used by everybody. Everybody means IT architects and users consuming services on top of it or also application developers who can use the cloud or components of this 360 degree cloud to deploy faster applications, develop faster application, have an environment where they can easily try things out and then also have an easy step to move from like POC test to production. Important is we try to base things on open standards to avoid lock-ins. So obviously open stack is one of the open standards that we think won't provide us a lock-in. Such a cloud should be scalable based on the requirements that the different users have. IT architects, someone deploying a new platform or maybe it's also the budget that will maybe adjust your requirements. So to keep it simple, it should be a standardized platform that is used within Swisscom but also by Swisscom customers or serving Swisscom customers and helping them to leverage their infrastructure. When we look within that 360 degree cloud, we have one thing that is called, one part that is called the Elastic Cloud, including the application cloud which we deploy on top of it. The idea there is that with this application cloud we like to offer a platform as a service and which eases development of new applications within Swisscom but also for customers. By reducing the complexity of building new infrastructure and new environments, we think it will be easier to iterate on new ideas, it will be easier to try out new things, but in the end it will also be easier with the time to go to market, meaning that we can also faster iterate on new products and then also being able to scale them out easily. So if you look at the requirements for a success on a platform as a service, people are usually referring to 12 factor applications. So there's 12 factor manifesto which describes how you should build applications so that they are deployable within a platform as a service. Deploying is one thing, the other thing is scale out. So if you try to stick with these rules, it will also be easier to scale out. One important thing there and this is something that a lot of people usually are already in trouble with it is like 12 factor applications should completely ignore the local file system, so it should not write to it, it should also not depend on configuration data that is on that file system. So what you really want are reproducible environments that you can just throw away, restart and also scale out easily. So if we have just units or droplets, which is what it's called within Cloud Foundry, which is the platform as a service that we use, it's very easy to run these within containers and it's very easy and fast to just run many containers. So the application should for example also not have any shared state or if it has shared state, it should leverage that to an external service. So a platform as a service is really something where you want to build tiny services. You might think about the microservice architecture and one of your goals is to consume services and APIs. In the end, an architecture will look more or less like something like that. So we're talking about web traffic, it hits a load balancer, maybe Nesl cell termination and then we have a couple of droplet execution agents which are just VMs that run containers and our application runs somewhere in one or multiple containers. The load balancer balances the traffic towards the container. If a container fails, the container will be restarted. If we realize that there is too much traffic, too much load on one container or the existing ten containers, we can simply scale out by telling the platform to start more containers. Important there is the services, so all the application, they will also consume external services. So as all the applications are just stateless applications which we scale out easily within containers, we still might want to store like some kind of user data, maybe someone signs up, makes a block post for example, a simple application, so somewhere these block posts should be stored. So we need to consume services. Services, this can be anything that provides for example state, but it can also be more service like an email gateway or it can be services like an API that then sends an SMS. A short message to your mobile number. This is more or less the picture how applications within an application cloud look like and I'm now going over to, going a little bit into detail how we build the platform that runs these kinds of applications and as I mentioned we built that on top of OpenStack, so we're using OpenStack to run Cloud Foundry on top of it. So we have like this elastic infrastructure as a service layer below which we built using Conta for compute nodes, Arista for the physical fabric, EMC ScaleIO as our SDS solution and then we have on top Red Hat OpenStack and further as an SDN layer PlumGrid which integrates very nicely with it. All these things that I now explains which are like our elastic infrastructure as a service, all the deployments, the configuration, the orchestration and so on, these things are being managed using Puppet. So we have quite a huge Puppet environment or multiple Puppet environments that manage the different parts. It includes also Arista switches but it includes also the setup of PlumGrid or configuration of OpenStack ScaleIO. On top we have the platform as a service layer where we are using Cloud Foundry, the open source version. Swisscom is a gold member of the Cloud Foundry Foundation. We have also a seat in the board so we're taking part of the development of the direction in which the Cloud Foundry ecosystem and its open source project now goes. Nevertheless, we put some kind of also modifications or modifications are the wrong words, extensions that like provide our integration with the surrounding Swisscom systems on top of Cloud Foundry. So we don't modify Cloud Foundry itself, we just extend it to integrate it. And for services, we also need to deliver some kind of persistent services like relational databases but maybe also message buses or just caches like a Redis cache. And at the moment we're heavily looking into leveraging Docker containers there to address these kinds of external services to Cloud Foundry. So actually Cloud Foundry has its own container implementation called Warden or Garden and where it runs the applications within. And for the services we're looking at the moment into using Docker containers for persistent services, which is something that most people don't do when they look into Docker. Most people look into Docker to unfortunately build more or less their own pass and then they're again talking about stateless containers. If we look at the network fabric, we have a very simple standard physical leaf spine fabric where we terminate layer three at the end of the rack. This is the way how we think we should scale data center-wide. I guess this is in common with what most people do and where most people would agree on. And then we have OpenStack on top of it and the combination between the virtual world in terms of what is within the workload network and within OpenStack and the external connectivity flows through the plumb grid gateways. If we then look at the virtual topology that we deploy on top of OpenStack within, we actually do multiple tenants. We do multiple networks within these tenants. We're using features provided by Plumb Grid to interconnect these tenants. Why we came up with multiple tenants are like different reasons like we have different provisioning orchestration systems to talk to the OpenStack API so we kind of want to have some kind of isolation between them. We also like to separate application traffic into its own network within OpenStack network while keeping control and internal API traffic of Cloud Foundry which eases then managing IP table rules and so on within Cloud Foundry to add additional security features. The same thing is like the services which we run in our own tenant so we can there independently iterate on things. All the things are then managed from a management tenant which is accessible from our management network. All the other things are fronted by a load balancer which balances into the dynamic routers of Cloud Foundry which are the first ones within the virtual network and also things like security features of these load balancers because we then have the traffic unencrypted and we can inspect and defend against different kinds of attack because we're running a public platform as a service also so this means like we can more or less anyone can upload any kind of application there. So what have we learned on our journey? So this journey now goes on for quite a while. We did adjustments to our initial strategy. We did adjustments based on project requirements on priorities and so on but something that is common if you're going out into a very traditional IT company to deploy Clouds, native workloads is like you also want to change the way how IT is being produced and so the term DevOps is kind of like a lot of people speaking about it so the good thing is we have quite some awareness within the company regarding that it's something where people are standing behind people agree on that we need to reorganize the overall process of how we produce IT but it's like there's not a schema you can apply and say I'm doing DevOps now so this is more a cultural change that you need to bring into your company and so this is something that we're still doing while we move on and it's also something that we still need to see how we can validate our life cycle that we build on top of our DevOps ideas and how operating a platform that is built on top of that will look like. Other challenges, OpenStack, Cloud Foundry, Plumgrid, all of them are distributed systems. Distributed systems are hard so if something fails it's very hard to understand and rule them or tricky to investigate so it is very important that you understand them correctly that you have insights into all the different components so that you realize how they work together and that you can master them. Active-active, higher-way ability within OpenStack is still a big topic higher-way ability is a topic within different sessions not all components are so far possible to run them in an active-active deployment this is certainly something where we try to push the community more to have simple higher-way ability deployments and not things where we need to have bigger fail-overs. I introduced that at the beginning so what is also very important if you are starting to go into an application cloud delivery model of your new applications the workload that you want to deploy on top of such an environment must change it will simply not work by doing shift and lift so you need to develop something that most people say it's like cloud-native applications very important in our opinion you need to automate everything you need to have a very strong continuous integration and continuous deployment in place otherwise you will just not be able to keep up with the pace how systems like that are evolving so Cloud Foundry has its own release cycle very frequent release cycles OpenStack has also its very own release cycle so do the other parts and you need to keep that continuously integrated otherwise you will just stay at Havana or Greasly or wherever when building or when operating such a distributed system what we also learned is that it's very important to make that orchestration transparent so also in some kind operational friendly in terms of that it is clear why something is orchestrated in a certain way because if you just have a black box that just does something and you cannot understand it it's very hard to debug in case of failure also it's very good to involve all participants as early as possible because people might hear we're going to have a cloud in half a year and their expectations might completely different from what you're building so it's very important that you get early feedback that you talk with them that you also look at their designs of the application so that you might influence them in their architecture because if they, as for example, they build a traditional architecture it will simply not being what is easy to deploy on such an environment so regarding future developments we have at the moment a delivery to which we're going to have soon this delivery is like dedicated to the platform as a service on top of the elastic cloud we're certainly then looking into how we can leverage more features of the OpenStack ecosystem that might not be directly important for the application cloud itself but might provide benefits for it so for example like databases as a service or things like that if they evolve and major then we will certainly look into getting them also I mean Cloud Foundry is like the very first kind of deployment that we put on top of an elastic stack and Cloud Foundry is really built to be deployed on such an environment nevertheless we see more workload coming that is being built to be deployed on such an environment so this is certainly we think that will not only be the ground for the application cloud but for more cloud native platforms also if we look at OpenStack itself people are often thinking in just one OpenStack installation and you deploy your platform into one OpenStack installation and what we see more is something that we need to have an orchestration where we do not deploy anymore into only one OpenStack into one cloud so we just have a honeycomb of clouds where we just orchestrate our workload into it and we can easily move around it so this gives us more resilience but also it eases life cycle of your OpenStack clouds and it also makes things like the workload can then be moved around depending on the current need important for that is certainly something like federated access which was one of the keynotes content but another thing is also that we can federate network amongst different clouds sometimes having just the traditional OpenStack approach of having floating IPs makes it not easy to federate workload across different clouds so having an easy way how we can federate networks amongst different clouds will make deployments much easier this just has a small glance in what we think how our journey will evolve it's just one part but I wanted to stress that out here a little bit and otherwise I'm done nearly in time so are there any questions? could you use the microphone if you have questions? it's standing there or I can repeat your question you talked about requirements for success and you mentioned the stateless applications how about stateful applications? is that part of your requirements? so this goes into the way where we also think that the workload must change so you can deploy stateful applications within Cloud Foundry but it probably has stayed for five minutes so this is just the way how the environment works in that way that it constantly will kill your containers and boot them up somewhere else so no deploying traditional stateful containers in Cloud Foundry is not a good idea this is not something that you want to do you can, I mean sure, no one is like and this is also something that we see a lot of times that people just try to push their existing applications to such an environment and then realizing later hey it's not working like my container got killed or like I need to write configuration into the file system and now you restarted the container somewhere else and my configuration is gone so what we then do are working with the developers together and also with the people looking into the infrastructure so we make workshops with them upfront even maybe before they deployed something especially to introduce them to these new concepts so that they also see how we can change the existing architecture because otherwise you will just not be successful in my opinion you should also not use it in the future if you have stateful containers so the point I'd like to make is really if you like to build cloud native applications you need to do that separation of state and push state out to consumable services and so that your application will just be tiny applications that are spawned very fast that can be scaled out and scaled down again that can be moved around and if you have state within this application this is not possible then you have the traditional deployment model I have two questions if you don't mind question number one is how do you handle transactional let's say payments that require consistency across multiple sites in a stateless fashion that's question number one and the question number two is you mentioned open source and configuration items EMDB I'm curious if you can elaborate what type of tools used for configuration items EMDB as open source today that satisfy your requirements to build a syscom cloud so I can take first the last one which where I can't really tell you anything because I spoke about configuration management in my talk and we use Puppet there we use a lot of the open source Puppet ecosystem but this is not a CMDB so I cannot really tell you there something about it and the other part was about how do you deal with transactional things within state this application so this is something where where you also need to adapt the architecture of your application so you might want to push a payment for like someone buys something in your web shop you want to push that out into an service that you consume so maybe your application don't really it's just talking to an API then the next question is who is going to provide you the API is this something that you also need to that you also want to deliver on top of that and yes you could build such a microservice architecture by also deploying things like that on top of it it's just the question is how will you how will you design the architecture for your application that you can make an API call which then will only return if the transaction is finished as an example it's not clear to me where you are in this journey I understand you left the test phase the proof of concept phase and you are in production one question would be when did you leave the test phase when you really started the production and also how many nodes are involved in your Swisscom cloud right now with OpenStack and so on so we actually have multiple installations with different sizes from very few one to multiple to bigger ones when did we go into production so the question is a little bit what kind of production because if you look at the different layers you always have the layer you're building on top needs to be stable for you so otherwise if the ground below is constantly failing then then you cannot build something on top of it what we did do was to open up our application cloud very early for developers so that that even when we were not yet sure how all these things will go out because the important thing is that they can play around with the platform that they can start developing applications on top of it that they can start developing applications using this concept and using the API to like do their CI and CD but we just told them at the moment we have a best effort and now we over time we evolved and we got more stable in our infrastructure and it's like the time window is like now when we're starting to like really getting workload on that environment where we also are offering let's say a production environment well for us then I mean for us the application cloud that we already had where people were already playing around was already a production because like people were using it and it was just like the kind of SLAs that people might expect are still different I think it's also important to to rethink the idea of SLAs within environments like these because you push a lot of things up to the application layer and the application should be able to deal with failures and nodes going away in these terms like containers so the platform itself will launch the containers again but that's also why you want to have a scaled out application architecture so this means that in the end you don't get an SLA on exactly one container you get more an SLA on the platform itself for the service that the platform provides to you and the service is I'm running your applications Any other questions? Otherwise we're done Thanks for listening I'm here if you have more questions