 Hi, this is more power less pain building an internal platform with CNCF tools and I am nervous my name is Dave studio though and This is actually a sequel to a talk I gave last year called balancing power and pain Which I gave with Tony rib and it was about moving a single application from a platform as a service over into Kubernetes and When we finished that talk a bunch of people came so oh, so you build a platform to deploy apps and we said no we Moved an app the platform is coming could come talk to us next year. So this is next year Who am I? I'm a senior DevOps engineer. I build clusters and deployed databases and write utility applications and I help with architecture and this year. I've been leading the effort to Make a more dev developer friendly platform inside of our company And that was prompted by one of our lead developers at a hackathon last winter coming up and saying I don't really know what my project is gonna be other than figuring out why this is all so hard and how to make it Not so hard. So that was a catalyst for saying, okay We can operate now, but it's not pleasant. We need to work on the developer experience and make this better. I Work for company called go spot check. It's a field execution app where you Have sort of have data collection missions that you fill out on a mobile application that go back for Near real-time sort of business intelligence analysis We as just for context on our size We've got about 40 engineers and QA folks and we started as a Rails monolith and we've sort of branched out in the last couple of years to go microservices and node-based cloud functions we run Postgres and Couch and Kafka we have a Scala-based data pipeline so lots of different technologies, which is really why cloud native Made sense for us as we migrated out because you know, we we believed it would end up supporting supporting everything that we did So we have context on why we're building this platform. Why CNCF tools? Well, we started on heroku platform as a service We outgrew it and kudos to heroku for enabling us to get too big for heroku. We love you guys We decided to move to Kubernetes to future proof ourselves and for a lot more detail in that please watch last year's talk, but One of the reasons for moving to Kubernetes was the ecosystem We figure we're gonna go to a place where we we have a lot of Surrounding tools that are gonna fill the the gaps of utility that we need Maybe not right now, but but soon or in the future as well as now And we figured that it would be a long-term bet to to pay off in terms of interoperability and efficiency And I do think it is paying off But there was struggle Going from, you know, a monolith on an all-in-one platform with one or two teams that were very close to Lots of teams working on lots of services in a distributed environment with open-source tooling Has been difficult and the best metaphor I could come up with is from my experience as a parent It has really felt like going from the slick stroller system to a bunch of toddlers poking each other in the heart There there have been there have been growing pains One of the biggest difficulties we've had was just resourcing We're as pretty you know, we're a smallish mid-sized company my team The operations team is 2.2 people the point to being my manager who tries really hard to be an IC but has to manage and You know we we used to be the dev ops team now or the ops team because there's too much ops to do now that we Can't really do any of the dev stuff We had a platform team at one point it you know being dispersed out into the feature teams And so there's there was actually no resources dedicated to building this platform It's all just been done with whatever we can sort of grab from people and get them to to contribute And it's just been time it's taken us about two and a half years to figure out What does our platform even need to have much as how we'll build it and how we'll you know What what pieces we can put together to compose it and then just to do the iterative process of we moved in at now We've moved lots. What what's missing? What do we need right and going through that sort of discovery process? And so there's been this pendulum from simplicity to complexity and and then sort of back and into the middle that I want to talk about When this was going to be a breakout session I had all the logos here is going to do a deep dive into the stack that we created But the deeper message for me going through this is that the CNCF has lots of options for every kind of tool There's probably more than one implementation and you can really build and compose a stack out of whatever you want that Fits for your needs. So so less than paying attention to the choices we made You know, it's more that it's possible to do this composition of a platform now I've had this mantra for the last couple of years of if you can wait six months You should and that's just because the ecosystem is moving so fast The tooling is being developed so fast that in six months everything is going to be a hundred percent easier I'll get into my case study for that in a little bit But but I've been saying this over and over again and I feel like it's starting to change So we started the the pendulum swing of simplicity and simplicity is great But it has limits we were able to deploy really quickly everything was sort of there for us But we hit this point where we had maxed out the number of servers The the postgres configuration was slowing us down and we needed to have more control and customization over our environment So we swung the pendulum all the way over to the other side of power and complexity We had the ability to do whatever we wanted But you know, it's like the dog catching the car What what do we actually want is is the question we ended up having to answer and do we have the resources to do it all right, uh Going to that side we get into kubernetes and we realize oh we need Metrics, okay, well great primates is there for that and we need to be able to see What's happening between our systems? Well distributed tracing is there for that. Yeager's there. Okay, cool We got that covered, but then you get into the smaller nitty-gritty things of how do we actually secure this? properly and Hey, I would expect that if I change my environment variables the pod's going to restart But that's not actually true. How do we get pod restarts when the config changes? And there's just these and and and and it starts to feel overwhelming And that's why I'm so excited that things are kind of swinging into this happy medium where the tooling's there It takes less work to get the tools working together And I think critically the vendor support is catching up. I'm going to give you some some concrete examples of this observability A couple years ago. We were running our own from atheists and yeager Rail support for both was meager or non-existent So our observability infrastructure ended up getting split between an apm solution and the cloud native stuff Some of the cloud native stuff was behind a vpn some of it wasn't Most of our engineers had ever been on a vpn before so that was confusing for them and the cloud native stuff was falling over Occasionally or commonly because we didn't have dedicated eyes on it because we were doing 50 things trying to set all the and and stuff up So it wasn't a great experience Now we actually have a vendor who fully supports cloud native stuff It's not an external metric that's going to get charged at five times the cost And they give us a helm chart. We deploy the helm chart. It sends out prometheus and open telemetry We send our stuff there. It forwards it on to a central place where everyone can look at it And we've got great tooling around it. It's been a fantastic experience That's only really been possible in the last year year and a half Mesh and increases and other this was my case study for if you can wait six months. You should three years ago I was writing my own api implementations for envoy because it was just a spec There's nothing actually distributed for it And I got uh, it was almost done with it and they released v2 And then I was almost done with it and istio came out And that was really exciting because it was just going to do it for me But it was still hard to deploy and it was really complex maybe more complex than we needed And and so it's grown to now Most of these things like we we use linker d and you can deploy a production configuration of linker d with a single cli command The progress is just astounding and has made it so much easier to be in the space and to play with things and and get things out quickly The standards are getting better And so interoperability is actually becoming a real Possible thing right we use ambassador for our gateway and it plays very nicely with linker d Both of those play nicely with the open uh tracing and open telemetry so we can get observability into all this really easily It's nice. It's really nice Building and deploying this is probably the number one Uh value ad we've had this year for our develop that developer experience. I mentioned in that, you know When when that when that lead was talking about how hard everything was he was really referring to I got to do 17 yaml files and I got to write this docker file and then we have to maintain all of it And we've cut all that down by abstracting it out with standardized helm charts that we've written for our organization for You know our go microservice or a rails service And now they just have to put in some values and deploy it. It's getting much closer to being a pleasant platform For docker files. We're using build packs now Uh no more docker files to write or maintain ironically enough. We're using haroku's build pack stack So as we uh migrate out the last long tail of our applications We don't even have to change the proc files. It's seamless to just get things out From where they were and deploy them into kubernetes. It's it's been real pleasant and made things much faster And the thing I'm most excited about honestly is actually the marketplace that is being created I've actually gotten surprise from people when I say that we want to buy and not build And I think that's partially because a lot of the most vocal Uh people in this environment are the big players and so Thanks to kubcon for give me a keynote slot to to give this perspective But you know, there's a space between the five person org and the five thousand person org that where there's there's room Where we we want to buy and not build We're too big for the platform as a service But I was talking with a vp of a fortune 500 tech company and his team that is extending tecton Is twice as big as my team right they can build We we need to buy because we can't be running all this ourselves a little bit about the process of how we built this platform The first most important thing that we did was we treated it like a product your internal dev tools are a product Any platform you build inside is a product you have to treat it like one I think one of the biggest pitfalls of of platform teams that I've seen Is that they build things for themselves and platform teams tend to be composed of the the wonkiest Engineers or some of them and you know it ends up being a complex docker orchestration stack that is powered by make files written in vim Uh to to do local development and I'm one of those wonky people. I like make I like them don't at me but Uh, that's not what everyone consuming that platform wants to use and so I got I didn't get a product manager But I got part time from a product manager for two months Who helped me do interviews of our our internal users of our engineers and qa and support and we put together personas And then we were and then he kind of downloaded into my brain a bunch of product management You know info and strategies and stuff that we've used over the rest of the year to to help inform what we build The next thing we've done that's been really critical is sort of harnessing conways law We know that you're going to have Your technology eventually, uh, you know reflect your people and your teams We try to hack that a bit and make cross functional working groups where there's representation of the personas we created From various people. So if you know, we had a persona of the junior engineer And the like the senior engineer that just wants to ship stuff and isn't really into configuration and getting their hands dirty and dev op stuff Those people need to be represented on a group that's working on how do we do observability or how do we, you know, work on How do we improve our local development? So, uh, those groups have defined most of the work to be done As an operations team, we stood back and we're more the library for tools Here are all the things we know about that will let you Local deploy into the cluster, right? Um and and then you guys can go research this and uh Y'all can figure out what you want to use and then they're able to become experts and Uh, make the choices which are are really impactful for people actually wanted to use that Since we don't have anyone any dedicated team to actually implementing That team has a board it generates We we generate stories and then people pick up those stories as they can to Write the script that will, you know, install all the tooling you need to interact with, uh, our service mesh or something So the final thing I want to talk about is just the promise Of this ecosystem that I think is starting to become fulfilled Last year my favorite keynote was from brian liles, uh, the when is it kubernetes going to have its rails moment? And and he called out my favorite quote, which is that kubernetes is necessarily complex I don't think we're quite at a rails moment, but as a relatively early adopter. I think we're getting a lot closer It it's not exactly the stroller. I showed earlier, but but it's getting to be a little bit more like this necessarily complex squirrel feeder um, you know, it does a lot of stuff that is um important It we still had to put it together. We had to compose it, but uh, it looks and feels a lot more professional and Uh, it's it's a lot more pleasant to to to use So my mantra is changing. It's now if you can wait six months You might actually be good to go right now The the tooling is at a point in terms of maturity and interoperability where you can dive in as a mid-sized org and compose something That that is going to serve your needs probably pretty well And one of the things I'm most excited about is seeing the actual platform as a service here every year Because I really truly believe that if you are a five person org, you should not be in kubernetes You should be on a platform as a service But now you can still go places and be like we use containers and kubernetes because Behind the scenes you actually are Uh, thank you for watching. I'm at the development and the regular places Since i'm not about to step down from a platform Uh podium into the Crowd of my peers and chat, which I really miss If you didn't get something out of this that you which you did or you want more detail I would be overjoyed if you reached out to me. I love talking about this stuff. Thank you so much for watching