 Hello, everybody. It's a great to be here. I'm from a range. We're going to talk a little about Cloud Foundry in Bosch and several other tools from the USS community. I think you pretty know a lot of them, of course. And thanks for coming to this session. It's based on just sharing, raw sharing from a user perspective point of view of what we need to get from scratch, meaning even no Bosch in it at all, up to having a multi-cloud foundry deployment in production for a few apps. But that's good. So I am Fabien Guichard. Oh, sorry. I am Fabien Guichard. I am cloud operation reader from Orange. I am technical and DevOps addicts. You probably should, if you want to dig into open source. Thank you. If you want to dive into open source product and manage with many other teammates, some of them are here. A multi-data center Cloud Foundry deployments. So it's not, of course, only me. I have high-quality teammates that make, you will see key contribution to the community helping us dealing with an operating Cloud Foundry in production. So a few words about Orange. We are a European telco company with more than 265 million customers around the world. In France, we have more than 30 million customers. The trouble is that you never know when they will connect to your services because it based on contracts. So you can have fewer customers, but moreover than 30 million. So in terms of scaling, you will see it's pretty tough to operate that kind of platform. And so we are pretty big and we saw that we could achieve and we have the scale to benefit from automation and from past usage. And you were a new member to the Cloud Foundry Foundation. We're a member since June 2017. And we came to the foundation because Cloud Foundry helped us to go smoothly to Cloud Native and to have some best practice and we have right now, we had contributions since 2012. So we have been users since pretty long and we felt that it would be a good way to give back to the community by being a part of the foundation. And the division I work on, the digital factory inside O-Range. So basically we're working for French, orange, France marketing teams developing and operating software for them. The interesting part for an operator is that we have more than 100 million space per view per day. So it means you have some scaling problem to deal with. So it makes the thing very interesting to deal with. Like every online advertising business model, the management has some very big expectations and it's cost a lot. And the digital factory has developed for a long time an open source software culture, meaning that we're used to deal with open source software community because basically you don't have to pay a lot of things, you know, in terms of product software. Few words, three data centers in France. More or less, sorry, from the CMDB was not great for me. More or less than 300 application, three cloud friendly production department based on the open source software distribution. And right now a few services we have on the French portal. So if you want to have a look on your smartphone or whatever, you can just go. They are being run by cloud foundry. Little disclaimer, in O-Range or the division are using proprietary and vendor distributions. Some of them are using red ad products, for example. So it's very of you from digital factory. It doesn't mean O-Range is using at all cloud foundry everywhere in open source way. And it's pretty technical, a pretty technical talk. So if I am, you know, quite clear, just ask and do not hesitate to ask questions at the end of the talk. I try to make myself as clear as I can. So from scratch, the first things we had when we started the journey was on highest. So we have Apache Clue Stack highest. That's great. It's a, you know, 100% open source software. So it's not open stack. And we are Xen as an hypervisor. So Xen with cloud stack, nobody else in the world probably had that by that time. So we started from that. It's really from scratch. So the first thing we needed to do to deploy a single Bosch releases, of course, safe resist by that time, was to have a CPI. Basically, you don't have CPI. You just can't push and to release Bosch deployment. So we contributed the first external CPI in 2015. So it's a CPI for cloud stack, basically. And when you want to produce, you also have to have a steam cell. So we had to produce our own steam cell for cloud stack Xen. It's still not upstream, but we are trying a work hard to make it upstream very soon. Once you have that, you are able to Bosch in it. You are able to deploy your first cloud foundry. So it was pretty quick then. And we have to move the first cloud ready application into cloud foundry. So basically the first developer in November making the very first CF push and saying, okay, it's working. Four months of work just to have a single and simple CF push working. But it was very interesting to do. And the second thing, the problems start there. People are realizing that you can't really go to production. It's not just there. You're making a bold. So we had security teams coming under the field saying, what does the steam cell come from? We have a lot of internal tools trying to rank the level of security of assets coming from open source community. Basically, you could have some backdoor that kind of sink into the code. So they are trying to detect. They were asking how the build pack is bringing the droplet. You know, you can just put some special libraries that maybe are going to have some open door to the internet. And the root FS also, how do you patch it? What kind of library do you have? So there was a lot of security assessments because cloud foundry was not really known. The security inside the community is pretty huge. They're doing a great work. But we have to demonstrate that to begin with. The second pretty big stuff we have to do is multi-site means you have to decide how to do HA. So there's two ways to do that. You stretch your database and you have to deal with CAP theorems, basically. So latency problem. The other one, the one we choose to do was to have three separate cloud controller database for each data centers. And we have a pattern deployment based on CD and you will see after Terraform to synchronize and to make the whole three data centers converge in terms of configuration. So it was by that time, you know, like challenges we took and you will see it works and it works. And the last, we are able to deal with multi-DC cloud foundry production because of security assessment and how we will do HA. And the first people came with the application and start saying we have troubles with performance. So we had to do two months work to show how what does it means performance problem in terms of application, in terms of external configured devices. So a lot of things to train to explain to our teammates and to our colleagues. I wanted to show the first schema we had. So it's pretty basic. I mean, if you go to the open source documents, you will see that you have to separate, basically, Bosch from Cotson plain from data plane. So in red and in orange, you have the data plane price. It means that's where we took the queries from the Internet and drain them to our own data centers. So that's where the bad guys might be and might come into your house. The red part, it's networks that are being accessed only by internal people. So you can manage them in a different way in terms of security. All the separation is being done on a very whole technology for those who are knowing that it's Internet segment network separation. I mean VLANs to 800 to 1.Q specification. So we are very at very low level in terms of separation. So now a lot of patterns are not being done like that, especially for people going to public cloud. But as we have our own data centers, we're able to implement that kind of patterns pretty hard. But, you know, that's the way we did in the legacy. So we produce that here. And you have the same thing on every data center. Sorry, my micro. On the right part of the schema, you have the marketplace VLAN, each offer at its own networks. So, you know, the firewalls are being routing everything and outracing and not every kind of service talking to each other. And we have a VLAN with Bosch and Concurse Prometheus, you know, all together. It's the same thing on three different data centers. And the special part that it makes possible to have an application of three different data centers running on cloud foundry. It's because of this device on top, F5 VPN devices, it's a lot balancing device with GTM, global traffic management. And it works with external DNS to provide some kind of multi-data center resiliency. But it's important, those devices are not managed in any way, neither by Bosch or cloud foundry. You have to provide them yourself and to make the configuration consistent with your own domains inside and your own routers inside cloud foundry. So, the first issue we had for them, you know, taking up in my mind, the first one, we are an operation team. So, we are not, we were not, sorry, used with Git, Git operation, Git flows, that kind of stuff. And the first thing we had to deal with was basically, you commit, you know, a one configuration. How do you roll back without destroying everything in the Git tree, for example. So, we had to learn that kind of thing. And I think all people going to on-premise cloud foundry deployment shouldn't underestimate the training they need to do on the operation team to deal with that kind of things. And I did the error myself. So, I promise you, you never have too much training in Git operation. I don't mean Git flows, but Git operation themselves. You know, how you do your river, when do you have to do, how do you do your river? That kind of things. The second thing, he was a problem with garden last year in 2016. So, when you are using Xen hyper-verser, you install Xen tools, and he needs to have exclusive mounts of slash proc to talk with hyper-verser. Sorry, unfortunately, garden needs the same thing to work with Diego sale. So, when you have, with Xen tools, there was a problem. Both of them wanted exclusive mount on slash proc, it means that Diego was not being able to bootstrap the Diego process and we are not able to deliver any kind of container. So the issue is here, it has been corrected, but it was a second problem. And the two last ones, they are not on the operation, but more on developer part. When you push application to CoolFoundry, the first thing you have to deal with is that, okay, how do you connect to a database? Where is the configuration and how? And you have to make your developer know that they need to change the code, meaning that if you want to expose the services, the port you need to have in your code is not a viable or a config file. It's exposed with an environment-viable calling port. And the same goes to the services connecting to Redis, RebitMQ, Postgres, MySQL, whatever. So they have to change a little of their code and that's what we call cloud ready migration. Just being able to start with some different way of taking the port number in your code and taking the strings to connect to your database. But it's really the thing you start with, basically. And that's what we call going to cloud ready. The very first step, changing your code to do that. And the second thing we had with our front developer, basically they just do CSS, JavaScript and HTML modification. They put a five on the keyboard and two seconds after, they have the results. They were making the same thing with CF push. And obviously they were 30 seconds, 40 seconds. Okay, not cloud foundry, it's just not for me. Wow, what are you saying? So we have a plug-in that is able, so it's a CF plug-in sync that just lets you local environment, synchronize with the Roman containers without having to restage your files. So it was something not that big, but in terms of developer experience for the front end developer, it was crucial for them to be part of the project. So great, a lot of feedback, that's cool. So now how do we go more to production? We have seen that, okay, you have to do that to begin with, to be credible, but now we want to have some real end users using application running on cloud foundry. The finish, we decided not to be all the time in production because internet is a pretty wide world when you are a tier one operator like Orange. So you know that. So when you're not confident with anything, you just don't move full time on production, basically. So we decided to be only on business hours on production. It begins in September 2016, so pretty long ago. The first thing that we faced was how you manage multi-site configuration for your end users. Cloud Foundry is a self-service platform in terms of philosophy and when you have multiple deployments, people were asking, I mean, too many pull requests, we had too many manual tasks, provisioning, orgs, space, roads, domain, quota, everywhere. And we are not very confident if somebody is asking, I push my application on data center one and on data center two. How do you say for sure that the organization, the space, the build pack are going to be the same? I tell you, it will be the same. No, no, no, no, no, no, no. It's not going to work that way. So the first thing we had to do is what how do you manage consistent configuration across different data centers? And as I told you as a disclaimer, we decided to have separate Cloud Control Database and to use Terraform being able to take some Terraform files and saying, yeah, if you have that in source control management, I promise you will have the same thing, or at least I will try to have the same thing in terms of resource in your production configuration. So we open source Terraform for Cloud Foundry Configuration provider, meaning you will see a lot of resource to manage your Cloud Foundry configuration. Second was to have, of course, Prometheus, my. Thanks, Ferran, wonderful job. So we see that Prometheus is a must go tools if you are operating a large scale Cloud Foundry deployment. And the last one, of course, concourse. No more manual, Bosch upload, Bosch release, Bosch deployment. It was fun, but it's not very good to manage in the time and to operate your platform that way. So you have to basically to go to concourse. So once we did that, we were more confident having, since May of this year, full services throughout them. You have the URL on the prerecordial and put them on the internet full time, 24 hours, seven day weeks. Scaling three parts. So the most interesting, no, they are all interesting, but the most challenging one, Prometheus, a federation. How do you aggregate to make your architecture story of Prometheus, Terraform, I already told that to you and concourse. Two things that I think that are not being underestimated. The first one, we have our whole legacy ticketing system. So you just come out of the field and say, I'm going to scratch everything down. No, we are to provide interfaces pretty simple with Mattermost open source software for Slack, basically. And with Xymon, yeah. We are using Xymon also with Apache Clutestart, Xen servers, yeah. We like to be that way, open source way. So we have to provide some bridge with that tool. And the second was security team, again, asking how do you, you know, you have a platform that is able to deal with so many languages with so different technology, Apache, PHP, Tomcat, Java, native, go, listen, server. You have Ruby in the same platform. So how do you manage incoming requests for customers or maybe from bad guys? So you have to look into WAF web application firewall. And there's two ways to do that. The first is to integrate them into the build pack. We have an issue on the PHP public app for that. So with mode security for Apache, basically. And the second one is to provide on your external load balancing devices to buy or to implement a WAF for all technology. They have both the drawbacks while dealing with both of them at the same time. So we don't have any kind of a greater idea. Quickly, Terraform. So that is one of Terraform files from production. So you will see resources, I don't see, but I think you have basically organization, spaces and for spaces you have security groups that you apply on your spaces to have generic firewall rules being opened. You see on the right side hold the resources and many more that can be managed with Terraform and you have the URL back on the bottom of the slide. So without that continuous deployment of configuration with Terraform, it's pretty tough when you scale and you have greater velocity to be sure that this configuration is in the exact same state on every data centers because we do not stretch and do not share the cloud control database. Concourse, so sorry, acceptance tests are red but it's real life. So it's a pretty simple one but we started with that and it's important to keep in mind that case principles keep in simple and just to start with simple things. So just deploy Cloud Foundry with Diego, you pass a smoke text each time you do every kind of modification, scaling the number of jobs, changing a release, changing a steam cell, whatever, configuration of a job and to have the acceptance tests running after that. I think it's very important and maybe obviously pipeline to have for hops. Promissives, Ferran again, wonderful job. The community has done really a great job on that because I mean, you just see a push and application and production and you just do that and you have more than 100 KPIs coming out of the box. So even with some vendor tools, pretty expensive, it just wonderful, I mean, that was the use cases that help us to move on some pretty like geek platform was technical ones to for the boss, oh yeah, okay, it looks fine. I have my home, my space, my application. Okay, it's pretty professional at last. So Promissives is I think for my own opinion, a must have in production for everybody running its own platform of course. And the multi data center federation. So we decided in terms of architecture to have primitive services for the marketplace being separated for those for Diego and Cloud Foundry. Both of them are being scrapped by a primitive master and the master, they are talking to each other to federate their home metrics. That's how we achieve high availability. I can promise for sure that we are nanoseconds or milliseconds, okay, but for our home use cases where we don't have like in banking or you know, real time system that kind of requirements, it's fine for us even if you have some one or two minutes delay in the federation of the KPIs. And what do we plan next? So three things, three are of security. It's in for a consultant from Pivotal, I can't remember his name, but we are trying to achieve the same thing as Google or whatever. Rotate your credential and your password often, repair your security, your CV as fast as you can and repay your containers each time a day or you know. So they are being done manually today and we need to automate, you know, like with things like trade-up, that kind of things. We had a lot of work to do on that. And you can see we have a terraform provider for credit meaning that you can put your passwords in terms of resources with terraform in the same terraform state file without having your password in plain text, you know, in the terraform file. So it's pretty cool. We want to have cow engineering not only in terms of incidents, meant to produce them. We have a few outage of course in production but I think it's a pretty good habit for a patient team to regularly put his own infrastructure down. And auto, yes, automator is more. So we have started a number of projects, CFOps automation. There are two part templates for Bosch and framework. The framework only is open source, we are working on open sourcing the template also. And yes, maintenance of the CF web UI basically because of the cloud controller evolution in terms of API. And challenges ahead. Marketplace, big stuff for operators. We really need to have some full and totally automatic provisioning of services. A lot of guys are just putting static credentials in terms of service broker or user-provided services or service keys and manage out of the box without slash v2 slash provisioning verb, their own clusters and for whatever database or whatever services it is. And we'd like really to have like Amazon or whatever slash v2 slash provisioning being used in production to provision on the demand and on the fly clusters on demand. So pretty big challenges ahead. Oh, yes. When you are in the operator, you spend a lot of your time dealing with network troubles. One of the most important one is flow management task. So we have to find a way. We are thinking about using service broker information to manage and operate external devices such as load balancer and firewalls. For example, it's not very clean in our head, but we know that for a time being when a provider wants to export the service on the port, let's say, 9,000, he have to talk to the people saying, let me, we'll let the flow come to this port, to my application, to my services. It's not dynamic. It's not automatic. So we need to work on that. And the last one, we started a project a few months ago with IPH NIFI to try to feed our machine learning tools with a lot of metrics from Log Regulator basically. So it's very early stage project, but we are hoping that we'll be able to have more than resurrection in Bosch but to manage infrastructure differently in terms of operation with machine learning in supervised mode, of course, not unsupervised mode. So thank you very much for hearing me. And if you got any question, feel free to ask. Now, the first thing is to see if the feeding model in terms of how touch detection is good, meaning when you ask to me, do you have an incident or not? I will try to put a lot of things in my head and my own experience to say, we have a problem or not? We're trying to achieve the same thing in terms just detection from the models. And the next things, of course, is to have, let's say, in terms of machine learning, supervised mode, some policies, predefined policies that the machine will try to make work based on its own detection model. And I'm dreaming, but Skynet, like things, is to have the unsupervised mode, of course, having some neural networks, detect some how touch we wouldn't see and to try to work out and to repair the things for us. But I think maybe five or 10 years from now. Now we are starting in supervised mode, training the model to detect what we call how touch. Just for trying it out. Ah, yeah, for trying. Okay, that's the idea. Basically, that's the idea we have. But yeah, keep on dreaming. Thank you. Any more question about VLAN, now network stuffs, operator stuff. How do you not know to manage Git comments in 2015? No? Okay, thank you. Thank you. Thank you.