 My name is Hans Christian. I'm a platform engineer at a company called NAV, or a government organisation called NAV. I just left Bergen today, so I actually had to wake up before my kids and that never happens. They're always the first, so it was really in the middle of the night. Apart from really, really loving to work with this cloud-native technology here, I really enjoy the outdoors and been working in Tracking Association previously, and this was actually from my plane leaving Bergen for Amsterdam, so you can see the wonderful western Norway and the sunrise there. So, what is this NAV thing? Well, it's the Norwegian Labour and Welfare Administration. It's a mouthful, so I'll just stick to NAV, because I always mess that up. And it's sort of the axle in the Norwegian Welfare Society. So we help our citizens all through all the paces of their life. So beginning of when you get born, you get child benefit, you get parental benefits, when you get older and you might be sick, you get sickness and unemployment benefits, and we have disability benefits and ultimately retirement pension. So all through life, NAV is there, and it's the basis of the welfare system in Norway. So I'll talk a little bit about, and I realized when I flew, oh it says, Legacy Platform Security. I'm not talking about Legacy Platform Security. I will be talking a little bit about our legacy and where we come from. I will then talk about our platform and platform engineering at NAV, and then of course how we make it all secure. Because in a galaxy, not that far away, this was how NAV looked, and I bet most of you that are working in companies that are more than just two years old, they have sort of experienced some similar here. It was very monolithic. It was this walled garden, it was this village. We have these outer walls to keep all of the dangerous animals on the outside, and everything was nice and peaceful on the inside at our village. Very few dependencies, very yearly releases. I were told stories that you had, you came in on a weekend, once in the air, once in a quarter, and you actually stood in line to get your things deployed. And if someone before you, if their deployment failed, you just had to go home. Let's try again next quarter, because there was so much mess to clean up. And of course it was physical servers, and we still have some of those servers. We do have mainframe still, we are still working our way through all of the legacy. And the organization looked something like this. We had sort of like NAV, the overall organization. Then we had NAV IT, and they didn't really talk that much together. It was sort of like, go make this, do this. We ordered that, and NAV IT would just like, okay, if you say so. And they would just then hand out contracts to subcontractors. There weren't really any employed, at least not developers. There were lots of architects and product managers at NAV IT. And of course this makes some challenges. Of course it was really costly to organize it this way. It was very manual, everything was manual under the sun. And because bureaucracy, we have to have this manual. This needs to be people in the loop all the times. I was really error prone, go figure. And that's why you're sort of like, let's go as slow as we can, because then there will be less errors. And of course you had this sort of disengagement from all around. That people weren't incentivized to actually, oh, I can really get out my best here and do my best work. That wasn't really what it was geared around. Then something changed. In 2015 we got a new director of NAV and a new director of IT. Okay, so far so good, that was really. And they actually decided that, guess what? We are going to change this. We are going to hire developers. So they actually started hiring developers. And in 2017 we started our own Kubernetes cluster on premise, because we had gotten some really, really good talent that were doing Kubernetes and doing really cool stuff from some private up and coming companies in Norway. And they really, really kickstarted and let's do this. And they got much people together, rallied behind them to sort of, oh, we can really fix NAV. We can fix the Norwegian welfare and we can make. And this graph here, which is sort of the bar here, are the average weekly deploys per year. So you can see it takes some years, but then things really, really start to skyrockets. And apparently there has been a lot of things going on in 2022 and now it's going a little bit down. I think we are sort of plateauing and this is sort of the peak here. There was a lot of due during the pandemic for NAV. So what we ended up with very, very early on was this custom operator. We call it our NICE. The platform is called NICE and the operator is called NICerator. And you can see here it's quite basic. I assume that all of you know your ins and outs of Kubernetes and really loves and breathes YAML. So it isn't really fancy here. It's just an application. And then it takes a thousand and one parameters that I have omitted. There are optionals of course. And of course what you get here is your deployment, your services, your ingress, service account, network policy and custom resource because guess what these things are implementation details. We can't expect our developers to care about these things. If they want to do great, you should be able to, but you don't need to. And as an application developer, you should be more focused on actually making the services that your customers, your end users are going to use and not sort of what's underneath here. Why do I need this deployment? Why do I need to know all of these resource manifest specifications here? In this case, they only needed one manifest to rule them all, our NICE.yaml. So actually it's not only one operator at this point. We actually have 12 because we got to love those operators and it's so easy with the QBuilder project. So you can scan the QR code here and it takes you to our GitHub repository. All of these are open sourced. And some of them are actually used outside of Nav and that's the goal here in order to give back and really sort of make sure that we don't reinvent the wheel over and over again. And we are starting to sort of see that turning more and more in the public sector in Norway. So our NICE v1 version of the platform, this should really, there's nothing fancy here really. You have developers working in Git. We have CICD and it sort of deploys your NICE application to our NICerator. It sets up your application as how you would expect it to be without you knowing or needing to know much about Kubernetes. And all of these were done on premise there's a huge, there's a long tail and lots of red tape in order to sort of get out of on-prem and onto cloud and doing. So let's start here. We can actually start modernizing and start improving from day one without actually needing to do all of those exercises at once. So we actually got to start modernizing and containerizing. We have Grafana and Prometheus. So the previous speaker mentioned sort of a centralized Prometheus that calls out to, no centralized Grafana that calls out to all of the different Prometheuses. We have the same set up there, so that was nice to see. And then you have your users going through our big IP FM load bancer to our on-premise environment and it talks with some legacy or old database system that we would like to modernize as well. We started one place and really got people excited and got the deployment frequency going from there and actually got to update these applications. But we didn't stop there, of course. We needed, in order to actually get the ball going here we needed to get out of our data centers and onto public cloud because let's be honest our job isn't to run Kubernetes. Our job isn't to run databases. Our job is to provide welfare systems to all of Norway. But we had some requirements, of course, leaving our data center, leaving our walled garden. That was really punctured at that time. We said that let's go away from the tired architecture. We need to go zero trust. We need to have mutual and transparent encryption. We need to have policy automation and we need to have as much managed services that we don't need to sort of know and secure all of the nitty gritty details. We need to configure them correctly, of course, but let's someone else manage those. So that was sort of our requirements and going back to our graph here in 2019 we actually got our first production services running on Google Cloud. So after some deliberation where are we going to go? At this point there weren't any public cloud providers directly in Norway. Google had opened a data center in Finland so that was not that far. Norwegians are friendly with our neighbors. We have our Swedes and Danes and Finns and Iceland, I guess, is sort of the Scandinavia. And from there on started to and you can see the graph is just skyrocketing after that because then all of our developers oh, we want to go to cloud, we want to be part of this as well and there was a lot of positive engagement there. Of course, we needed, as I said, sort of we needed policy and encryption and at that point we, having no experience with service mesh we went with Istio and really got to experience all of the pain and all of the frustration with a very, very heavy handed service mesh that sort of tries to do everything under the sun. So after some time we got something new and much better and this is our lead developer wrote and the QR code here goes to the blog post where the quote is taken. It has to be said there is a certain satisfaction in cleaning up after a party that has been going on for too long and I can't agree more. And really the migration was done in just a matter of hours. There weren't days or weeks, there were of course planning ahead but because we had this operator we controlled how all of the man downstream manifest are made we could just make the configuration there, roll it out and it would progressively redeploy all of the applications so we got sort of a graph dependency and we could see where there were clusters of applications that depended on each other, we redeployed those so they didn't have to talk across different service meshes and in a matter of hours we had everything over running over on liquidity really, really huge success and as I said the QR code there goes to the blog post where we talk about, writes about that and there hasn't really been much problems it was sort of lots of performance improvement less overhead, less worries is this related to the service mesh issues that's sort of been non-existent We did of course not remember to renew our route certificate one time and that was a real pain as sort of the previous Karlina talked about if you renew sort of when this route is still valid you can do a zero down time deployment it would just pull in the new and you're good but if it's already expired guess what you need to restart everything and we had some downtime due to that We did see that during really, really heavy load that were really unexpected during the autumn we had some performance issues or rather we had some request tuning issues that we had to tune and we haven't really experienced any problem with NQD since then and really sort of the whole nice application in the cloud Well, this is how it looks like it's really no difference from how the developers deployed their applications to our on-premise environment except that you get to use all of the new cool shiny features that otherwise you had to order and you had to wait and it was misconfigured and you had to go back and sort of until you finally got what you needed This would now being finally it could be automated So of course we have the same sort of setup here in nice v2 or the same base We implemented SNUK for code security scanning Initially we have since gone over to using github advanced security features and the penabot For our CSED we have started implementing Salsa for our supply chain security For our cluster security We started with OPA and since transition to Qvano Well, there were really sort of OPA does a lot of things and the bar of actually integrating there was a lot higher than with Qvano So it's not been an easy sort of migration there's been some quirks there as well but we feel that it's sort of better suited for the needs that we have when it comes to our cluster policies for how containers in our cluster should operate We of course going to a zero trust we needed to sort of have network policies all the way that wasn't really something that we were enforced on our on-premise environment So of course network policies can be a pain if you have to do them I wouldn't say manually but sort of operate and work with individual policies So of course we extended our nice application this is how you sort of say that application A should be able to talk with application B and that application B should be able to receive traffic from application A make it simple and for external traffic we use something called FQDN network policies it's an external operator made by Google that sort of make you the same here You can say that I want to allow traffic to this host here regardless of what IP addresses it resolves and the FQDN network policy will do that and it actually makes sort of network policies under the hood so it's turtles all the way down So with that we are this far in our sort of journey here Of course one of the main selling point of actually going to cloud was that we were able to then provision external resources so databases and buckets and sort of external systems that once were manually provisioned Again we extended our nice application manifest so you could just say that oh by the way I would like some SQL databases please because we really want all of our applications to have their own database we don't want applications to share between and also we don't want users to be manually sort of setting up their database manually setting up their access et cetera all of this should of course be done automatically so again nice orator to the rescue and then finally here sort of securing the traffic on its way in I sort of mentioned that we had some issues we had some unexpected traffic this spring or spring of 2022 and back then we hadn't really started we hadn't sort of DDoS protection at that level we had some on premise so we moved that all to Google cloud as well with the cloud armor DDoS protection so actually now Google does all of that DDoS protection for us and when those surges hits we don't really mention that or we don't really get to feel that because Nav has a really sort of predictive traffic we don't have huge sort of organic surges it's not that hopefully people well the one we had was during the pandemic when there were a lot of people that sort of got laid off and needed their benefits to continue but except from those rare events there it's not that we are getting organically DDoS then it's something else that we need to filter out so having that sort of managed by someone else really takes a lot of burden away from our platform team that really is sort of the high level overview here of course there are lots of other bits and pieces to dig into the detail for instance sort of using wire guard for secure connection from our developers onto our clusters and different tids bits there but I only got so much time so sort of some nice stats here because sort of oh this again we didn't really lift and shift all of the applications onto cloud that would be a horrible thing to do because we do have a lot of dependencies on our on premise environments and those can't be really lifted your cloud so it needs to be sort of a conscious move by the teams so it's they that control sort of which environment are they running in but all of the new and shiny stuff are in the cloud and you can see here from the graph that we are steadily going over so we picked the 50% mark before summer we are now at sort of I believe 62% of our applications so a total of 1000 applications running cloud and 600 running on our on premise environment we haven't sort of set a date for when we will sort of turn off our on premise environment at some point we hopefully get to sort of set that cut of date yeah and we are looking at some new exciting features in sort of moving on and summing up sort of what's worked for us giving the teams autonomy to control their application has really been a game changer and sort of incentivise them to actually caring about doing a good job and actually be able to do a good job the declarative config all the way has been sort of like this that has worked secure defaults so they don't really need to care don't need to read and read and read and know all the integrated details in order to get started and again making the right way the easy way and with that I'm rounding off here so if there are any questions I believe we have a couple of minutes we do have a couple of minutes thank you are there any questions after the applause all the way in the back hello can you hear me? cool so I have a question about this nice thing like what additional benefits does it have compared to having a base chart which you use to create for developers an interface like an ammo file that they control and same declarative way are there additional benefits? so I believe the question was what's the benefit having an operator versus a hub chart or a base chart yeah a base chart which gets imported and gives you the same it gives more or less the same but you're sort of in I've done that before and you really really really need to love jammo in order to make that work and still sort of have some sort of quirks there that if you make a change that shouldn't really require the developers to redeploy they still need to redeploy their application of course you can automate that in some fashion but in this case when we want to make something change here for instance going from linko d to stio to linko d we just roll out that as a sort of change in our operator the operator knows all of the applications currently running in the cluster so you do that migration on its own so that's sort of and of course when we build in that much feature it sort of goes a little bit without saying that you need a proper programming language you need proper test suits in order to sort of make that work correctly or having the confidence that when we make changes we don't break 1600 applications alright thank you and the next question is if I need to run that that scale I have 13 environments where I need to run the same application but with different configurations how like I also use helm or helm file or something else to template and still get some abstraction out of it is it possible to get abstraction with nice well the nice application is the abstraction there you can of course use GitOps and Flux or Argo we don't do that now we have two environments you have prod and you have non-prod so it's not that many and applications are typically either deployed to our on-premise environment or the cloud not both so typically an application team will have and I get the stock mark there will only have two environments that they really need to care about for one application thank you