 Hello When I left Norway on Monday morning, I had to walk through snow to get to the bus to get to the airport And then when I arrived in Austin about 18 hours later It was what we Norwegians call a really beautiful summer day And I thought I'd packed all the wrong clothes and I thought I need to adapt to this Turns out I didn't have to adapt at all Austin adapted to me Because yesterday it snowed in Austin probably my fault. I don't know My name is Edun Ferkelstrand. I work for the Norwegian labor and welfare administration And today I'm going to talk about how oops How I get this to work? Yeah, and today I'm going to talk about how By adopting Kubernetes in an old legacy system legacy platforms I've been able to solve a lot of the problems you had as an organization We were stuck on premise in our own data centers. We had way too many test environments We had an access control, which was too coarse. So it was problem for developers We had network zones almost hindering any kind of development pace We needed some kind of overview of all our dependencies that we could trust We had really bad monitoring and observability in our systems It was difficult and cumbersome to create new applications And we had loads of nightly batch jobs Today I'm first I'm going to talk a bit about now the organization I work for And then I'm going to go through all the problems I just talked about And how we solve them And then I'm going to talk a bit about the platform system we made And how we branded and how we how it's how it's working basically and then a few conclusions But first a bit about me I used to be a Java developer For many many years and I think the fact that I'm a developer going into the infrastructure space Makes a difference. I've heard a lot about All the loads of the keynotes I talked about how we want to Start solving the developer's problem and increase the development pace and make better deployment pipelines All of those things have been kind of my main goals for starting to work with infrastructure It's not about it's about increasing this pace and the development speed So when I started to when I've been a developer for like 10 years and had all this bitterness built up inside me and Thought I could use all those Experiences and all that bitterness and try to solve the problems for all the other developers So they didn't have to have all the problems I've had So I started to work with infrastructure and platforms. I started with Kafka I operated Kafka a few years from my former company And then I started the migration to kubernetes I presented something on kubecon in Berlin this spring Then I worked for a company called fin It is one of the largest websites in Norway. It's a classified marketplace And what I talked about then was how you can maintain continue continuous delivery whilst migrating to kubernetes And that's been a great success right now fin has migrated about 60 percent of all their services to kubernetes and at least in Norwegian terms it's loads of traffic and Even before the migration fin had a really good speed to deploy to production a thousand times a week and We managed to keep up that pace even making it better and the developers were very happy But I thought That problem was kind of solved. I needed to find something more difficult Then just migrating kubernetes in an environment an organization that had Most of stuff in place already So I started to work for the government And the Norwegian labor and welfare administration was Created in 2006 when we merged two different organizations And basically it's Got it has offices all around norway and people Uh, the citizens of norway when they have problems when I've lost their job or when they're sick or when they have kids They can come into this office or these offices and talk to the different people who work there and get help Basically, we have 50 different benefits. We pay out about a third of the national budget of norway One of the richest countries in the world is paid paid out by this agency unemployment benefits pensions child and parent benefits There's 600 people working in this it department Two years ago almost none of them are programmers Uh, there was architects and developers and testers and only use consultants and when you go back and think about Uh the When I got to nav, I thought this is really old and Strange and difficult and then I realized There's a history here Now I've got its first computer system in 1967 The database of that system still runs Is a large 10 gigabyte file We built a new Service on top of that, but we still use the database. It's in read only mode now. It contains the salaries of people living in norway from like 67 and up till 2010 I think And the big file is a bit difficult to handle. So there's this old lady She comes in on tuesdays and wednesdays She was a part of the team that built that system and she helps the team that's now trying to Be like architect into this large 10 gigabyte file in a strange language I don't understand and try to understand and find all the data they need to Get that data to use it for calculating pensions for norwegian norwegians That was the first system and of course we have mainframe systems Those are still in use We have this really strange oracle java thing which was created around 2000 where The java objects are Instances of those objects are in the database It's not like an app server So for the for all the developers have to have their own oracle database to do anything at all that's still in still running loads of ibm web server je Applications most of those are still running We have an ibm mq message queue still running The last five to ten years we've been migrating to more modern stacks like jboss and stuff most of our applications now are in jboss But we realized Well, do we have all these other systems? What's the harm of introducing a few more? So we started to work on docker and kubernetes oops, I forgot about images that's actually Prod is no reason for production. This was the production system at some point in now's history And now we want to be be somewhere around here so Now I've had a private cloud for a few years based on vmware It's kind of self service where the developers go into different web apps and provision the stuff they need they get vm vms for jboss or vms for ibm web server and Or databases and there's a really strict handover between developers and Ops guys So the ops guys have no the developers have no access at all to production And if there's an error in production the ops guys find it and then they might call the developers developers later At least until last day there was a three-month release cycle with a four-week test period afterwards and there was a set of testers and Release managers The culture was a bit more like this We had this really really really large development projects Outdoors to consultants And because that's scary we built up a really really really large organization to specify and Test also do projects created by the developers So we had no developers but architect testers and release managers So as you can see The challenge I wanted was bigger than the one I had before where I just had to maintain continuous delivery Whilst migrating Kubernetes now we wanted to use Kubernetes and use everything to kind of change this organization and change all these technical difficulties And the reason I wanted this is this guy my boss's boss's boss's boss his name is to iberna and he's the he's the cio of nav nav and he's a former developer So he understands things And he wants us to deliver twice the speed at half the price And we want to build autonomous teams have continuous delivery and have a you build it your own culture So that we can have developers actually running their own applications And we're quite inspired of what's happening here in the u.s With 18f and in the uk with gds and estonia actually, which is really good at Digitalizing their public sector. So what we created now To achieve this twice twice the speed to half the price. There's loads of things you need to do And I don't know how for those things, but I know how to Make platforms So I wanted to use my experience to build a Kubernetes based platform To try to change as much as possible Changing the culture increasing the development speed and try to improve the application architecture at the same time I was at the talk with the people from a german bank yesterday and I You talked about how you you had the kind of the same challenge, but you I think you had the harder time constraints So you you needed to do this without Improving the application architecture as much and without doing all other stuff We don't have those time constraints. So we we want to use this Opportunity of a change to actually change more than just moving to the cloud So now I'm going to go through all the different problems we had and how we solve those Using different Kubernetes concepts So a tiny Repetition we're stuck on premise. We have two minute test environments access controls a problem for developers We had network zones and firewalls that stopped almost any kind of development pace We needed overview of our of our dependencies that we could trust We had band monitoring. It was difficult to create new applications And we had loads of nightly batch jobs So the first part we're stuck in our own data centers And because we're a government agency We have loads of sensitive data The things Similar to the witness protection programs the addresses of those people are in our database So we need to kind of protect our data a lot. So we can't just move them outside of Norway without having proper control And there's no public cloud data centers in Norway. So we kind of we're stuck on premise basically But then again, there's loads of stuff happening in the open source in the public cloud world Which we want to leverage we want to be able to use all those survey Service as a server or product as a service platform service products that amazon and google and everybody offers Both because it will make us quicker and it will make everything cheaper And I think we think we can move at some point the laws says As far as we know that it's legal for us to move outside of Norway as long as we have encryption everything in place But the timeline of doing that is really long Uh, if we want to migrate everything outside, it's going to take a long rewrite and we can't migrate our mainframe systems for instance So the solution to this at least the start of the solution this is use Kubernetes Because when I speak to my developers or the development Developers in the development teams But what they need to do to be able to make an application run in a public cloud And what they have to do to make the applications running Kubernetes At least the first 10 to 15 steps are exactly the same So if we start by migrating applications from on-premise Kubernetes solution We can leverage the work they do then when we were allowed to move to public cloud later Also, we can Mirror or replicate most of the cool things we want in the public cloud offerings in Kubernetes we can have function as a service and posted Databases with less operational cost for developers And we can also leverage all the different cloud native technologies that exist around there for monitoring and service meshes and security I forget all on the animations I have and of course We can reduce cost. We have a really large I think Now which isn't really norway isn't the biggest country. So our We are kind of a traffic limit of we'll never have more than At least we can we can estimate our user user growth quite good We have approximately five million users now and we'll have approximately five million users for quite a few years Unless there's some kind of war situation where norway takes other countries, but I don't think that'll happen So instead we don't we don't It's more important for us to be able to run our applications more efficiently than how you handle like large Large scale increases so to conclude Moving to Kubernetes at least to start is the same as it's a good stepping stone to move to a public cloud And we get loads of value early on in the process So the next problem had we had two minute test environments More than 20 distinct test environments loads of them with production data Which means some of the data of all the people in the witness protection programs that we really really really don't want anyone to see And all those environments are difficult to coordinate I think it's a loop, but I'm not entirely sure Could be just a really long gif gif gif Yes, so we had all these tests environments and coordinating those is difficult There's three or four people whose job it is just to make sure that we have control over which versions of each service runs in which test environments And we have some of the legacy systems can't we can't have One copy of all the legacy systems in each test environment, so we have to kind of have some system to handle all that and my I think the reason we have all these test environments is because we have these really long release cycles Because when we have three or two four new developers each going to release something at the same time four times every year they need different and they It's difficult for them to integrate before that so they need separate environments and then some kind of test period afterwards So what did we do to solve this? basically We created two Kubernetes clusters in our solution production and not production And in not production we made it possible to Create all these different test environments using namespaces But every time someone asked for a new test environment We said no at least twice And then so they really had to have good reasons to be able to do this But of course You can't just take away all those opportunities at once You have we want people to actually use the things we build as well. So we have to make it easy when they Described us that they really need this So the automatic provisioning of these namespaces afterwards The diagram is supposed to show how we have There's many instances of many applications running in not production, but in production. There's only one of each and then we're Well, this makes it possible to Create the test environments as they do as we need we can have some isolation. We can have limits on the namespaces but it's still Easier to reduce afterwards and we have some we send them a bit so that people think why they need all these test environments So the next problem was access control This is more or less what it looked like when I left the norway on monday morning The access control in our old system Was that basically it was either we had access to production or had no no access to production And the operations people had access and the developers didn't And this kind of handover process where if you wanted to do anything in production you had to get You had to hand over the application to the operations people That didn't really work and it created It introduced It took away a lot of the trust between the different departments So what we then wanted to do what we basically want to do was to reduce the distance Between the people who caused the pain and the people who had the pain because now the distance was very much the developers Made errors, but they didn't really feel the pain of those errors that was done by the operations people so we wanted to Let the developers Feel all the errors they all the pain they've created themselves, but then we had to Make it possible for them to get access to production But we couldn't give them access to everything because we have sensitive data and we need some kind of control So we implemented the open ID Connect system where we have an Azure AD active directory, which is have all the ideas of all our developers And then we just used the Kubernetes namespaces to say that the development teams have access to stuff in their own their own applications So network zones we had loads so we had three or four different network zones and we had a firewall between them, which wasn't it was handled by Operated on manually which made it When you have that for 10 15 years, there really is no control anymore You just have loads of holes in the firewall and you don't really know where they are And it's difficult and slow to create new To open for applications So basically what we want to do is to automate all this stuff and we want to do this by using network policies And we do this by using network policies in Kubernetes So then when we have namespaces for each of the different teams We can basically in a declarative and fully automated way just recreate all the actual rules from the firewalls And have the necessary Necessary isolation between the different teams applications And we also look at spiffy and siren all the new cool stuff I learned about in the last few days To increase this even further Also, we had all these architects and architects like to draw diagrams of boxes and arrows and how everything fits together The problem is of course, it is difficult to do that in a Correct manner. Most of the time they do it either development time or deployment time They talk to developers or they try to they try to do some static code analysis to see who talks to who And that's never correct Normally these ends up in confidence pages and no one looks at Because there is really no value of having an incorrect diagram like this And what we want to do here Is to introduce is to introduce is to introduce is to introduce is to introduce is to introduce is to introduce Because is to along with a lot of the other stuff we can do Makes it possible to see a real time Absolutely correct image of who talks to who And then we have this And then we have these diagrams showing That actually the traffic patterns and we can easily add loads of stuff onto that diagrams like If anything is slow and how much traffic there is and Stuff like that And when we use open tracing we could probably also do this for our asynchronous communication So the next problem we had was monitoring The monitoring solutions we had were mostly made for the operations people and the bosses because those are the ones who Scared or those are the ones who's header on the block when something goes down and That means you had a separate team trying to create A monitoring solutions and they didn't really talk to the developers at all There were more uses of logs than metrics And most of the monitoring was on the infrastructure side not on the service side So here we basically did quite simply we introduced Prometheus as a Metrics database and a metrics collection system And using the Kubernetes method When you collect data using Prometheus On Kubernetes you get all the metadata from the deployments on the time series And that specific that that data makes it possible to create like generic dashboards like the one you can see here Which is actually shows data for any Application running in your cluster But you can drill down and see only data for one. So we created this graph on a dashboard Which has We chose the Generic information but it dropped down with with applications running with all the applications running So as soon as you deploy an application to our cluster You can go to the graph on dashboard and just pick your application and in your environment and in your cluster From on graphana and see all the data And we used the default exports for Prometheus So we can have jvm stats and node stats and stuff like that And cpu and memory data comes from hipster And we can even using this data quite easily create some kind of billing solution where we measured Resource utilization for each of the different systems because they're all reported in the exactly same way So we can create a billing dashboard saying Basically introducing something we haven't had before that the development team have to Take into consideration the cost of what they're creating So the next problem It was difficult to create new apps I talked a bit earlier about the old kind of private cloud They had where all the provisioning was done manually in web applications, which were homemade So we had to go to some systems to create the vms and create the applications and create the databases And other systems to create configurations for the different applications And to actually deploy you had to create a ticket in JIRA and the ops people did the deployment So we created something we created a separate application error called a nice deployment demon It is a gov application running inside the Kubernetes cluster And when you want to deploy an application You do an hdb post to that endpoint with some The file you can see on the side there is all the different things you can Configure, but there's low to sensible defaults and we created this specifically for nav So normally there's like 10 to 15 5 to 10 lines of yaml for an application Just saying what you what you need and then Then this nice deployment demon fetches this file and the docker container and deploys it And the fact that we use go as and have created this application ourselves Makes it possible for us to tailor this specifically to our needs for deploying an application So we can use this to integrate with some of the older systems like the system we have for application configuration has an api as well as an web GUI So we can reuse all the application configuration and makes it that makes it much easier to to migrate applications to our new system And then when we're done and we have a majority of this application inside our cluster we can try to reduce some of those need for those configurations And this also Integrates with all the different other systems Yeah, so we can say this says this turns on primitives if the application exposes primitives metrics And it handles everything And there's been loads of discussions I think this week about how to reduce the amount of yaml you need And I've seen loads of systems that kind of Generates yaml Into git repositories and then the kubectl. I say kubectl even though Brian Grant said something else apply But I like this approach much better because this makes it possible to have everything Inside repository of the application We have this file to get the with the application files and Docker file And one of the things I like about this is then it's not my responsibility When it's inside the developers git repository, it's a developer's responsibility And it's them who owns the application and all the configuration needed And we get the same advantages with a git commit log and everything for all the changes So the nice deploy them and takes this information and transforms that into kubernetes Resources So batch jobs Because of our old architecture we We normally run batch jobs at night And running batch jobs at night Is it's bad in many many ways? First of all someone has to be up at night and to check that everything goes okay And probably the developers work it in days. It has to be a separate team of people who run all the batch jobs And they don't understand the batch jobs because they haven't been part of the development team And there's research contention and there's it's difficult to scale because we normally also put our Bad jobs inside the application instances. So a batch job is just calling an endpoint in the application And then you counter that during the day because that's when the users use our systems So instead we try to introduce batch jobs as a concept in nice where we have a separate file nice job yaml file And we have the batch jobs as separate containers And again, this is the same kind of Thought processes as behind the nice yaml file where we Have sensible defaults and try to reduce the amount of yaml any developer has to write So now i'm going to that was kind of the Solutions and problems we had i'm just going to spend a few minutes talk a bit more about nice This is kind of the architecture i think i would talk most about we try to layer it so that it's possible to Run on many different platforms. We have the applications on top And then we have what we call the platform applications, which is Stuff like rook and is still and graphama and prometheus and a few operators mostly written in go running inside the cluster doing stuff And the fact that we base all this on the Community's apis makes it possible for us as i said before to run on premise now and migrate this to different cloud providers later Right from the start we focused a lot on continuous delivery of all we do So we have a pipeline that deploys the cluster We use ansible and Jenkins to that so when we do a change it's more or less automatically propagated into the cluster And we have the same for the platform applications We use a helm for those because then we can leverage all the stuff in the helm stable repositories And we use a small open source tool called landscaper, which makes it possible to do continuous delivery of helm Helm charts And as i've now this is my second second go at migrating a Large system into communities and i realized that Creating the platform. It's so much easier than migrating the applications Probably because we want to Increase the quality whilst we do this but Just writing it up and platform would take us a lot less if we didn't have to tailor it to the fact that we had to migrate applications So The platform we want to build That's going to be the next step when we migrate with most of the stuff. We can try to create the perfect application No, the perfect platform If you're still there, of course And the fact that we can reuse parts of the private cloud and makes it much easier to migrate For storage, we just started to use rook i o And that works really well. It's I've heard rumors that's going to be a cncf hosted project really soon and it's I've always thought setting up like distributed storage would was difficult and scary but using rook makes this Incredibly easy We also want to Offer postgres using the postgres operator. Then we can move away from our old oracle stack We open sourced everything Not necessarily because we think anyone can reuse what we do But because we're a tax funded government agency. We think what we do should be open and public because it's basically the Our interpretation of some laws and Having the open source model also works really well internally I normally Give cake to anyone outside my team who creates pull requests into nice and Cake works really well to get people to do stuff for you. I've had loads of success with that over many years At finna managed the whole organization to change do Weeks of work just because I promised them cake And any team that migrates an application to nice and gets proper traffic in production gets cake And it works because it creates enthusiasm and it creates visibility So a few conclusions I think kubernetes is a really good product also to kind of create platforms that shape and cultures and organizations because it's it's kind of an implementation of continuous delivery I think it helps it makes makes it easier for us to deliver and All the things we need the teams won't also do we can deliver quickly because of kubernetes When you do something like this, it's much more important to focus on the migration of applications than creating the platforms Because that's where you're going to spend most of your time Building a brand like nice and giving out cake and creating this internal open source enthusiasm Really helps because then people I've never asked in both my two projects doing this I never asked anyone to migrate to our applications They just they come knocking on the door straight away when it's when we start to talk about what we have And lastly if you want to build a ability to run it culture where the developers do the most of their operations themselves Kubernetes gives you loads of good tools for that and it also reduces the amount of work you have to do if you're ever in oslo and norway and Have anything you want to share in the cloud native space. I run the cloud native foundation meetup in oslo So you can contact me Yeah, well Yeah, I think it's difficult to describe culture But I think lack of trust was a really good Word for when we started this journey the first kind of In that explosion, but the first kind of thing that happened was when developers wanted access to production because the ops guys didn't want to do that, but That's what we wanted to achieve and that's kind of the first thing where we saw that we had to change the culture and we managed to kind of do We managed to create some kind of system where we have them ops guys Because a lot of our developers haven't really got too much experience with running stuff in production either So we need to train them as well and we try to help make the ops guys help with that and Also, we need to start small You can't just put the most important thing in production right away You have to build trust and people have to see that this will work Yeah other questions All the mainframe developers keeps telling me that We had this 50 years ago. They say And of course some of it is true. They have like this shared The The pool of resources and how they use it is very similar And they even had all the building stuff made when I when I started to say Oh, we can make building with this They just start laughing because they had this this was kind of built into the system from the start So it's obviously concepts are similar, but this is on data center level So it's just it's different what you can achieve. I think Yeah, when you Rook sets up sefin a way that handles persistent voting claims and everything And no, but we haven't not too long. This is it was the last slide I made basically because I'm coming here We have a few apps using it, but the main guy who did the work says it's At least it's easy Let's see if it works as well, but Yeah, sef works and I remember from berlin the rook people say Well, rook is just for setting up sef It's not on the data path. Then it's just normal sef Yeah, sorry docker Docker Yeah now I kind of wanted but I never had anyone Tell me they've done that successfully yet loads of pushback But I think we're not done here. We're kind of we know where we want to and we We've done some of it, but we know. Oh, sorry. She said stop I'll just finish we know this is going to take years Because some of the systems has to be rewritten from scratch There's no and that's some of our core systems. So A lot of the old processes will be there for many years Thank you