 Oké, topic of my talk, automate all the things I like automation, it's really one of the favorite things I like to do. We use in our environment, we use pivotal cloud foundry and we have the pivotal operations manager. I don't know how many people in the room who have experience running or operating pivotal cloud foundry deployment. That's quite a lot, good. So you all have heard about the operations manager going, how nice it is and allows you to install and upgrade the environment. My name is Onderbrouwer, I'm a technical specialist working at Rijkswaterstaat. Rijkswaterstaat in a nutshell is an executive organization of the Dutch Ministry of Infrastructure and Water Management. In a nutshell what we do is make sure that people can move around on roads, on waterways, they have clean drinking water and keep their feet dry, about almost 9000 employees and growing. We started cloud foundry, or pivotal cloud foundry I should say, the project started in December 2015. I myself joined Rijkswaterstaat a few months earlier, I was at the time responsible for managing the corporate website, the internet and also deploying what you call rapid application, development applications, running a kind of red framework. So I have seen the way you could do the traditional deployment, it took like one month to get a new application on the platform. You had to do a lot of things configuration while everybody who runs cloud foundry already knows the platform does it for you. When they invited me to see the deployment of cloud foundry, Pivotal was there, like came in Tuesday and left on Wednesday and platform was up and running and the first app was going in production like one week later, that was like, well put down the whole platform in like one day and next day we can push the first app, that was quite impressive to me, that was not the way I was used to how software deployments went, so that was cool, I wanted to know more. I have this 1500 page PDF document, you can start reading. Okay, it's an interesting road to go, but we wanted to know more. We wanted to have some training, so we asked Pivotal, can you do some kind of training of the platform? So we got from the PCF immersion training, which was like a whole week, we had the whole breakdown from CF push up to creating and binding services and the whole thing. I knew anything a developer would do using the platform. But not much in the way of operation training at the time, I know they now have excellent dojo trains for that, but I think they did not really exist at that time. So this was a really exciting time for us, the first time we actually could do a first actual upgrade, but it was a bit on the late side, a few months earlier we had a security scan, by the security operations center, and they were like, your platform is vulnerable, it's not up to date, your Ubuntu release is old, but we had to wait until we had a test environment because we wanted to be able to move, test the upgrade before we would do it into production. The actual first upgrade we did from PCF 1 to 6 to 1.7 and also 1 to 8, because there were like two releases behind, and that was all going fine and well, and the host manager makes it really easy to do that. Unfortunately applications turned out to be not compatible with the new version of the UA, coming with the new form, so we had to go back and we could test the rollback procedure in test and everything, but yeah, it was a little bit rough road in the beginning. The platform itself, we promoted within the organization, upper management basically says whatever you put, new applications, we should go past first. That's basically unless you have a real good reason to go to Kubernetes or other technologies, but really, we want to be past first. The benefits are obvious to anyone who has seen it in operation. So the uptake, on the red line I've seen here the growth of the number of applications we have to platform, we have currently over 250 applications in our production environment, and more interesting to me is also the blue line, the uptake and the number of upgrades. In the beginning there were very little upgrades going on later on when people get more use to the platform and more confident in that I can upgrade it and it will keep working. It's really interesting improvement and it's growing, it's increasing. I don't do this all by myself. In the beginning we were a very small team, like two guys actually doing some things, Mr. Platform, and now we are like five. For us, I think the most important thing when you operate a platform like this, you have to be very much Linux focused, Windows engineers. Well, they have to relearnings to do, I guess. A lot of things go with command line. And most important of all, people have to be eager to learn. Because before I heard about Cloud Foundry, it was like, okay, a new platform, what I have to do, I have to learn about Borscht, I have to learn about the runtime, we want to use automation. Okay, there's concourse, there's the fly, there's so many tools and new things coming, you really need to be able to adapt and learn. What do we do? Well, first of all, of course, maintain, monitor and upgrade the platform. We also have an information channel towards our customers. We have two weeks, one for ourselves to keep track of any technical details on the platform we need to know. But we also have a wiki for our customers so we can tell them about how to do things like how to deploy an app, how to create a MySQL service, how to configure singles right on. It seems to be a rather difficult topic at times. Basically, we try to keep people involved. Whenever someone runs into a problem, we haven't documented before, we will find out a solution and put it on the wiki so everybody can find it there. And the wiki is running on GitLab, so it is version controlled. So cool. And it's, yeah, the wiki uses Markdown. So we can automate things there. We can create automatic pages. Markdown is really easy to use technology for that. Another thing we do is organize workshops. We run it on a regular basis, basically. We run it a few times a year, at least. And always the feedback is extremely positive. Anyone who's seen it, we even had a management session where managers themselves are pushing an app into the cloud. Ah, see, of course, it's easy. I mean, of course, it only starts there. You have to do more. But yeah, the workshops are always extremely successful, well-visited. And this innovation day is in December. We're going to do another workshop. And then we're going to tell them not only how to push the app, but also how to automate it. Use concourse, make pipelines. Then it gets really exciting. Also, yeah, we help developers or we facilitate the onboarding of new applications, especially people who have never seen Cloud Foundry before. We had one case, like, oh, we have this PHP app. What do we need to do to make it run? Well, what does it do? It has a database, okay, database. Okay, let's just import the database. Let's push it on the platform, build back PHP application. Hey, it runs. Okay, it looks like you don't need to do that much at all to get it running. That was like, oh, wow, I pushed my application to Cloud Foundry. Cool. I said, well, yeah, okay, it's not a big deal for us anymore. But yeah, the developer experience is really very positive across the board, as I can say. But what I like most, automate everything. Now I get really, truly emotional. We not only have Pivotal Cloud Foundry running. On the left, you see the stack, well, kind of like a stack. Usually, when you deploy Cloud Foundry, you start with the operations manager. You import the operations manager as a VM. The operations manager spins up the boss director, and the boss director spins up all the components of the platform. But we need more than that. We need monitoring, we need automation, we need storage. So on the right side, we have a standalone Linux job box from where we can deploy a standalone boss director, which is managing our storage, the Minio deployment, which is using, like, an S3 compatible interface. Where our backups go, permit just for monitoring of the platform, and concourse for automation. This is how we did upgrades in the beginning. The nice operations manager GUI, very easy to use. You click on tiles, you fill in configuration parameters, and then you click the big blue apply changes button. No, there isn't more than that. Before you do that, first you have to make a backup. So you have to export the installation settings. You have to make a backup of your blob store, your databases. Shut down the ops manager, spin up a new one, import settings, import the new tiles, any mandatory changes, new stem cells. And then you hit the blue button, and then you wait. Yeah, unless something goes wrong. En this was, to me, always the most worrisome part. If you have to go back, you have to delete everything and restore from backup, and then you have to outage. So we always had to do it in the nighttime in the beginning, and then it was like, oh, surface window, surface window. Are we still be able to go roll back before? I'm happy to say that nowadays, with the confidence of the platform from our customers, we can do upgrades in daytime. Minor patches we do fully automated, majors we still do by hand, because there might be changes in configuration. En for automation we use pipeline. Here we have a screenshot of our concours CICD dashboard, which shows all pipelines while we're still working on things. So it's not all green. It's all in work in progress, but we have like nearly 40 pipelines running at the moment, which do all kinds of things like monitoring, upgrading, and reporting. It also needs a command line interface, and well, here's a screenshot, but that doesn't really add much here at the moment. Upgrading, we have several different pipelines running for upgrading. We upgrade automatically the operations manager, the post style, or the elastic runtime as it used to be called. Upgrade buildpacks, all the other tiles. Every day we run the Bosch backup when we store to make an automated backup of the platform, and linked to that is the automated apply changes for the minor patches. And the coolest of all, when the pipeline is finished, we get a message so we can do other things until either something goes wrong or the job is done. We have in our operations room a very big screen rotating through displays. One of these is our main pipeline. The applied change is shown in top. Unfortunately, it's not quite readable on this resolution. But basically what we do is upgrade test first, then we run a smoke test to make sure everything is still working, en we repeat the same cycle for production. Next to that, on the bottom right screen, we also have a very cool, the same smoke test in a slightly different schedule, running every five minutes checking the platform. Created by the guys sitting here in front in orange, ITQ. Great guys, wonderful software. It helped us several times finding problems before our customer did. Zooming in on the smoke test, repeat it every five minutes. It basically does a deploy. It does integration with the UAV, all the scenarios for authentication, and then it runs its test. And what's the test doing? Well, basically testing if you can push the app, but also are the services we are offering, are they working? So the ideas then, we can give this to the help desk. And if somebody says, hey, this app is not working, it's running on Cloud Foundry, the help desk can look at the desk. And if it's all green, maybe it's your app which is broken, not the platform, because that is always our worry that the platform gets blamed while maybe something else is broken. And this allows us to easily see where the problem is and then we can investigate. We also need to upgrade the standalone, the concourse itself, Borsch, Prometheus, and the Minio backup server, so we got pipelines for that. We also have a nice pipeline which collects all changes on all directors. For monitoring, well, I already mentioned Prometheus, that is the first monitoring component which we installed with the help of Pivotal. That was really cool because before that, we had no clue on the performance, the capacity of the platform and how it worked in relation to all the applications people or were like pushing to the platform. We also, not too long ago, installed the PCF Health Watch, which is a nice product which basically checks the Pivotal KPIs and has a focus on developers, on the end users on the left side, the developers in the center and on the operators, people like us. But it's not a perfect tool. We had recently announced it, for example, with DNS. DNS was broken. The smoke test called it, but the PCF Health Watch didn't because it was using internal DNS, which was cached, so. And we also have integration with Splunk where all our logging goes and we have some nice executive management dashboards you can see everything is all up and green, perfect. The wiki I mentioned earlier, for our customers, this is the customer facing wiki, I can't show it all. There's a lot of pages and a lot of topics there. But we put our reporting there. Wiki was based on Git. We can use Markdown. Well, Markdown, anyone who knows it, is pretty simple format. It's much easier than HTML. So we can create scripts, running in concourse, creating automatic reports. For example, when we had recently a new PHP build back coming out. Okay, there's a vulnerability. We have to patch it. But how do we figure out which applications use the PHP build back? He would have to go through all orgs, all spades. Well, let's make a script for that. Let's put it in a pipeline and then we have a nice report. Things like that, and it's growing. The screenshot here of the PCF versions page, which is very much liked by our customers. They can exactly see after each applied changes the page get updated, which version we are running from each product. It's really cool feedback towards the customers. And if they click on the blue link, they get directed to the release notes on pivnet. This is also an interesting one, the MySQL quotas. Anyone running PCF with MySQL cluster in it? With the quota enforcer? Nobody? Apparently not. The thing is, if you run out of quota, you can put into read-only mode. And the only thing you can do at that point is delete data. So you want to avoid that. So we want to tell our customers like, hey, look at your quota and do something when it gets into the danger zone. Because we basically promote the platformers like self-service platforms. So we want our developers to manage their own applications. We are not really there to manage the applications. We are there to manage the platform. Let's us learn the important part for us. First of all, if you run a platform on-premise, it's extremely complex, a very large number of components. Pivotal makes it easy to do the additional deployment. Hit the Apply Changes button and it goes running. But you really start learning how things work when the applied change is filled and then you have to look into logs, you have to play around with ports to fix things. It's easy to get started, but you really start learning when things go wrong. And then there's really a lot of things which can and did go wrong in the past. So also very important, when you run a platform in an enterprise, how do I get access to the platform in an enterprise? Usually your workplace is like closed. You can't install tools like the Fly CLI of the CLI. So you have to find some way to facilitate that. Namely, we did some quick and dirty deployment in the beginning, but we had no separate system organization or a system domain. Our production domain was called production. It was not really a good idea in the end. Self-service. Well, I mentioned it before, our developers are supposed to manage their own applications, but we have been a little bit lenient in the past. Oh, you need a backup? Ok, I'll make a backup and I'll restore it to where. Ok, we will do that. And they keep coming back. And we put it on the wiki. You can do it yourself. Ah, this is hard. Can you please... You have to think about who is doing what when you're doing your own premise deployment. Manifests and version control, that's like the best practice. You're supposed to use this and make it easier for yourself and your fellow developers when you push applications. So everything is controlled. That test, acceptance and production are all on the same level, like configuration. If you don't do that, well, you might run like... It works in acceptance, but when I push it to production, it failed because it was a different setup, a different sizing, dimensioning. Data is hard. I mean, Cloud Foundry, 12-factor, stateless apps. Ok, but yeah, where my data goes. We are running on my SQL cluster now. We are going to move to Postgres. But yeah, if you do this kind of move or consider this kind of move, you have to know what are the similarities and the differences of the technologies involved. We really need a DBA. We are hiring. Proactive monitoring. I mentioned the smoke test earlier, running every five minutes. That's basically part of our proactive monitoring. When things go wrong like DNS, we can catch it before our customers do. And we really need this kind of thing because anything which can break will break. And basically almost everything we have ever seen, we are dependent on, has broken at one point in time. And then we have to be able to pinpoint it and fix it. Blind spots. This is also a nice one. Don't fall into the trap that you think you've got all your monitoring set up and you're done. It's like a continuing process. For example, the boss director manages the VM sets de ploys. But who is watching the boss director? It filled up the persistence this one time and we didn't know until the plan changes filled. And it's like, okay, we need to add something to our monitoring to monitor boss. The operations manager also at one time ran out of memory. Okay, we didn't think about that. Things like that. You have to always keep on improving things. You're not done when you do the deploy. It's an ongoing process. And this is also, well, Cloud Foundry itself is hardly available, is stable, et cetera. But we always depend on things like DNS certificates. We use Singles and on integration as ADFS. We have connectivity through load balancers. Sharepoint was mentioned. Okay, we have that too. Might fill and yeah, the weakest link of the chain is the strength of the chain as a whole. So you have to be aware of that. And you have to think about SLAs to put in place if we want to run 24-7, the components we depend on have to have 24-7 surface levels as well. Yeah, any questions? How are you managing the major version upgrades which can't be easily automated? Good question. We basically base our upgrades currently on the PCF pipelines and understand from Pivotal. They're also working on PCF automation which would make it easier to build this kind of thing. But as Pivotal recommends themselves, they only use it for minor upgrades. So like after the second number major patches, usually because you also need to do additional configuration, we have to do them by hand. But they're not as often as stem cell patches or minors. So yeah, measures we still do by hand. But we do have to mandate from our customers we can do it in the daytime, which is much better than do it at midnight. And you know, yeah, no further questions. Okay, well, thank you all for coming and listening to my talk. And I would like to add, well, next year, maybe we see you again. It's my hometown, come visit lovely city. Thank you.