 FinTech that started four or five years ago and last year got our series C funding and officially became a unicorn, so it's been a fun journey as we have to deal with that scale. I think I'm... Am I... Am I huh? You said pal? I'm trying to get on. How's that? You got me? I'm still talking? Hey! I mean that seems like most of the days. Is it working yet? Did I figure out the right parameters? No? Try again. So what I was saying is we are a FinTech who has had to rapidly scale up and got series B, series C funding all within about a year timeframe and had just a tremendous change opportunity that backstage has been a huge part of. Before that I was working on a cloud migration for a healthcare SaaS company, had to deal with all the HIPAA and highly regulated issues and also brought to market their first Kubernetes based environment for their microservice, chat, IBR thing. We'll talk about that a little bit. And then before that I spent a lot of time doing consulting, selling my soul and what not for enterprise architecture engagements, document content management and those sort of things. So today we will talk a little bit about Caribou's story because I think that'll help frame some of the challenges and then we'll also talk a little bit about how the developer expectations have changed and how that impacts how your organization will change. Through that we can understand the obstacles and then start diving into how you can design a cloud native platform centering around backstage to accelerate the growth or accelerate the re-platforming of your organization. It's rooted in convention and it's rooted in iteration. Now as I had mentioned before when I started at Caribou we were primarily focusing on cash flow. How do we plan a two, three year ramp up cycle? But the pandemic had some unintended consequences of people wanting to refinance their car. Back then we were known as Moto ReFi and that was our primary line of business. The application was built on a Ruby monolith and to this day deployments are still painful with the monolith and yes the massive database still causes a lot of N plus one queries. It's fun. At that time also we were busy, we started moving off of a SaaS platform and a variety of SaaS tools that made sense at a time when there were only about 12 engineers but as we were growing we quickly needed something more robust. From about that time that I started about a week later we got more funding and now we had to accelerate and start rapidly growing. We were onboarding engineers at a phenomenal rate. It was very confusing for a lot of engineers but we got through it. We also had 10X growth both in terms of the business as well as the pressure on the system. And interestingly because this was a monolith a lot of the pressure came from our internal employees even though it was also serving a public website. And so those sort of inefficiencies and those pressures made it so clear that there was both the pressure cooker that caused us to need to decompose as well as this ability to make it so that we could handle many many many more engineers also dealing with a diverse platform. And so that's when we started moving on to Kubernetes, Argo, Argo workflows and that sort of thing and starting to think through that platform mentality and how we would advance that to be able to support what we were expecting was another 10X growth in the coming years after years. And how can we also support now where we are at as the economic environment has shifted cost savings, strategic mindset to know that what we are developing and what we are investing in is something that is going to be a developer is one to use and be our customers can benefit from. And how do we make it so that our business becomes predictable and the platform really is stable and robust. From the beginning our culture has been centered around being metrics driven so that we can also be experimental so that we can learn new things so that we can innovate. Our CTO has been very inspired by the Spotify culture. We frequently reference some of the open source materials and videos. But yet we have the challenge that we are also in a regulated industry because we're dealing with finances. We have people's credit reports. So how do you enable those things and I'm glad that there were some questions around that earlier so I added a bit of a slide in later that we can talk about. Also the cornerstone, especially when you look at a lot of the door and accelerate research is psychological safety. When engineers feel scared, when they feel that they have to, you know, kill themselves basically to hit deadlines, when they have that pressure, when they're afraid of making mistakes, we all know that's how you have an unstable platform. So that's why we also frequently hear Mr. Rogers quotes coming from our CTO. And it's been very clear since I've been there, being heavy-handed is not an option. Really defining architecture is not an option because that goes against the principles of psychological safety and innovation. So how do we then move that change forward in a slightly more organic way? And there are certain challenges that keep that from happening. We've already talked about the learning curve a lot today so I'm going to kind of move past that. But suffice it to say, I think it is impossible to guarantee that any engineer you hire has experience with the complete tech stack that you might be working on. It's just so diverse now. You just can't predict anything. But also there is the other mentality shift which has proven hard but yet is manageable and the way that you can use scaffolder and use backstage to build trust allows for a mind shift from a custom job shop where really everything is a Herculean effort or like custom crafted to shipping small features including a new architectural patterns. But then also from a developer expectations that this has really been in the last three years I've noticed that where we used to be able to say, eh, there's a little pain, it's a little dicey, that parameter, yeah we didn't document that so well but you can figure it out because your buddy will tell you that because the quality of the tools has gotten so good it is no longer reasonable. It is a huge point of frustration from engineers when there is that friction. They expect the same amount of polish that they would use on their personal projects in that heavy regulated environment. One of the lost opportunities from one of my former places of employment was when they were starting to roll out one of their first services. Now this was an IVR system that the idea was that you could call in and it would automatically send a fax to your healthcare provider. Now you might ask yourself why still send a fax within the last ten years but nonetheless that's where we were and it seemed like there was going to be this great opportunity because the use case was really simple and there were already fairly clear boundaries so it would not be too difficult to pull out a document service for all of the templating that needed to occur, a fax integration with a third party provider and deploy that as a different container, make it really easy to test and monitor and then more of a kind of command handler aggregation fax type of service to actually deal with you know pulling all the business logic together what not. It seemed like it was going to be a slam dunk. It was not instead maybe you have seen this. Everything was crammed into a single container. There were not even clear boundaries between controllers and services and all this stuff. Cron jobs were executing in the same container that the web server was and business logic and all the domain stuff was just jammed in there. Why is this? I was like this is so easy. I know Kubernetes. The engineers did not. And so actually Kubernetes is getting a bad rep and I've noticed this from some of our newer engineers who have been in the workforce maybe two or three years where they're willing to say that the complexity in overhead is just kind of not worth it. I disagree because of the architectural patterns that it unlocks. But people actually think that Kubernetes is making it unstable instead of more stable. Which of course if it's not configured right that will happen. Also standing up services takes too long because you have to wire up a whole bunch of different pipelines. Use copy pasta. It's very confusing. And it takes forever to chase through those mouths of YAML. But when you're also trying to change your organization's architecture where it used to be clear what class to go and modify you now don't know. What do I name things? Do I put it in an existing service? Do I move it somewhere else? That simple question on the surface. As we all know naming is one of the hard problems and that takes up time and causes friction. And for engineers who have not had to work in a regulated environment the idea of having to deal with a platform that now has security and minimal access security at that. That there are private VPCs that you can't just directly access from your local. That there are processes that you have to go through to be able to know that that code is secure. These are all points of friction that cause people to not want to use more advanced and more distributed architectural patterns. There's this confusion that the right way and the fast way are seen as mutually exclusive. But in order to get to market quickly we can deal with the illities tomorrow because right now we just need to slam the code in where we can get it deployed quickly. We don't have time to deal with that other stuff. But ironically the shortcuts time and time again that are chosen by teams actually cause it to ship slower. Because now you're moving from what would be a nice distributed system into a distributed monolith that is tightly coupled has to have multiple components moving at the same time. And honestly can you blame the engineers for getting frustrated when those type of situations arise? Because now there's this association that this more distributed micro service Kubernetes thing is kind of a pain in the neck. But I think what it really could be re-expressed as Kubernetes isn't so much hard as far as I don't know how big the learning curve is. Naming and problems with naming means that there is probably a problem with an unintuitive architecture. Things taking too long means that the scope of the project and development methodologies might need adapted and that things are slowing down is because there's a lack of understanding of the advantages or how to implement event driven or micro service based patterns. And that's really where backstage comes in. As others today have already expressed how much easier it can make things for engineers to understand and have visibility into the system it also reduces the complexity dramatically. Whereas we used to say that DevOps was the bridge or the thing that kind of sat in between or the thing that overlapped with or left shift or whatever you want to call it between developer knowledge and infrastructure knowledge and automation. Now really moving into the platform world we're seeing that backstage serves that role and almost completely eliminates the concept of that team that is the infrastructure or the automation team because it is a service provider mentality. So if you're going to roll out a platform you might as well have a little fun and that's where we have created antlers and any good acronym needs a backer name so we then came up with automatic normalize tool chain for launching easily releasable systems so that we can make it a little fun. And I'm glad someone asked how much do you need to support this. Because we have about 60 engineers and five DevOps engineers. About 9, 12 developer teams depending on you know where we're at and we're effectively filling three roles of SRE of automation and dev tool team and also the kind of architectural advisors for a lot of the teams because we have a lot of experience. And when I brought it backstage to my team they said no have you lost your mind there is no way we have time for this. So luckily we found a managed provider with Rody and shameless plug they have been fantastic to work with and allowed us to be able to actually adapt this quickly without having to have the overhead. One of my big fears actually was breaking APIs and patch maintenance and those sort of things in the burden that that would happen. So I've been very, very excited when we found this and saw we don't have to deal with any of that. And it's also been a wonderful working relationship to have people who truly understand how to implement this and always ask thought provoking questions about how you might want to adapt your system catalog and those sorts of things. So really great partner to work with and what we call it internally is the barn because that's where it all comes together for all the caribou to hang out. When we are architecting we are architecting for clarity. When we start out we do not know what the path is kind of like a mountain in the distance you don't see all the crags and valleys and the obstacles that will come in your way but you at least know which direction you're starting in. And our goal is to make it easy to know do I go left? Do I go right? So that we will wind up making mistakes but they are not the big problems. They're the ones that are regular that we're all of course going to make. And that's my cat Sam. The key of this is that if you have a simple and intuitive architecture you can also have a simple and intuitive implementation of scaffolder. And so this is where we started rolling this out because I'm trying to in the back of my mind thinking through this talking with my team about this how can we avoid the challenges that we've seen before. When we decided to roll this out and when you're doing any architectural change it's really important to make sure that we set boundaries. What is the scope of the old world versus the new world and how do we onboard people because that expectation for polish if it's rocky you will lose confidence. And so then keeping it simple we have a management cluster that serves as the control plane effectively that then manages a workload analytics and monitoring. So boundaries that we've decided make sense from a intrusion and risk and all that good security stuff. And we're heavily relying on Argo CD Argo workflows and infrastructure as code that with Argo CD pretty much everything in our environment is modeled as an application including workflows and including Terraform jobs and whatnot. Everything is YAML and templatable. Now we could have done something more complex but the reality is you have to gauge the speed that you need to move things out with and then the learning curve of your own team for implementing more complex models and great, YAML is easy enough. And so then through that Argo can deploy any of our resources and because we also take advantage of config connector or we could have just as easily used crossplane we are provisioning all of the infrastructure as part of the Argo CD applications and packages. Great for templating. The key here is when a YAML engineer or DevOps engineer doesn't think through convention you will be entering the world of pain and I think we've all been there. And so that's where if you can build intuition, if you can build your architecture into the structure of everything so that it permeates your entire environment, maybe if you have an established platform that it only permeates the part of your environment that you've decided is in scope but then you start having consistency across your domain design, your catalog, your repos, your clusters, your namespace of services all the way down. Makes it easier to find things and it gives us some significant advantage. We can limit choices about where things live, what things should be named, making it easier and reducing the cognitive burden by including drop downs, by taking away free form, fields and making it easy to understand how these things map. So there's a clear mapping between the system catalog route, our domains, systems and components and the naming convention of those resources when they get deployed. From a standpoint of how do you organize that architectural change, PR workflows? And this is fantastic. So basically you can have abstractions for your engineers to think through as domains and then set rules. What are the things that should be shared within a domain? What are the things that should be shared within a given system? What are the things that should be shared in a component? Hopefully not too much. But basically we can say that a PR opens a template, simply opens a PR in a parent repository and then through that workflow you can have governance. You can have that ability to say the staff engineers sign off on new domains but the teams can create new systems and certainly any engineer can create new components. It creates those clear boundaries and it creates an idea, a hook to hang your projection of what that system on, so to speak. It makes it easy to understand the architecture. I mentioned the PR workflow. But then also because we're using Argo workflows and Argo events to capture all of our GitHub events, we can have any number of checks running, all baked into our templates. We can have audit trails coming off the changes from Argo and from GitHub. And all of this information as well as I'm very excited now we get to lock down our GitHub repositories and have a full audit trail being able to trace the identity of who creates the different template things. I'm very excited about that feature. So then we have this full picture of what's happening in our system and because it's all baked into Git and baked into really easy to track data, we've solved a lot of these problems. And in the future, hopefully Q1 or Q2, we start rolling out OPA because then we can also validate and have some admissions controllers to get that extra layer of concern for making sure that our resources are what we would expect. Because we were trying to move fast, we had to choose. I would have loved to have been able to spend more time rolling out custom operators and resource definitions. But the reality is we didn't have time for that. And I think a lot of us get into a position where we don't have time, so what is good enough? And can you make small modular components that have those best practices? It can be from your database to your cloud storage, to your deployments. You can easily bake those into Helm charts. You can bake your requirements into those and bake those defaults into those. And you can easily bake in different deployment patterns. If you need command query, cool. You can do that and provide those basics for your team pretty easily. And because you follow semantic versioning, I would hope, that now if you treat this as each template maps to an umbrella Helm chart, now you can add in your own customizations into that Helm chart, and then all you're really having to deal with to template out is some of the parameters in the values file. So we can churn out these templates really quite quickly. And all the engineers have to do is think about the modular components that they want. And the results are that the barriers to adapting more complex architectural change is significantly reduced. It's too big of a learning curve. It's one click. You can customize it, but it's one click. Architecture is unintuitive. Well, it becomes more intuitive when you're following conventions. There's too much to do. Well, you have fewer decisions. And the unfamiliar patterns are solved by codifying those best practices in that code. And that has reduced the reason why teams would say, it's just easier to throw it in the monolith. And I can actually ship faster by doing it this way. That through, these are some rough numbers that the monolith builds and all that stuff, people who don't use scaffolder are spending so much more time up to two months trying to get their components deployed. But teams using the scaffolding components can spin up their IAC and their services and their pipelines and their workflow checks in a matter of minutes. And what's important to me is because my team is scarce on resources. So now we're spending our time not doing operational stuff that's low value and spending our time doing higher value stuff, making more templates, documentation, training, leveling up engineers and creating a learning culture. And pro tip, and I think some of the talks later this afternoon might have some interesting things with this. But one area where we are struggling still is that we have some of the patterns that we iterated through that are not up to date. And when you have those things, it creates confusion amongst developers. So being diligent on how you update your experience, really, really important. Otherwise, you will be drowning in YAML. And then it's really hard to script updates. Otherwise, you can easily script YAML version bumps and pro tip. The whole idea is that we iterate and we mature. That we work on building a ton of different modular components. And we've recently seen other teams start adopting this. So our data pipelines previously were all this big batch ETL. And the data engineering team has taken it amongst themselves to build event driven templates. So that now you can simply add on sidecars. You can add on these additional things so that any engineering team really doesn't have to do a whole lot to get at least basic change data capture or basic events consumed into the data warehouse. We are working on ensuring that we have understanding about what's evolving, particularly by using the tech radar and setting expectations for the quality and the maturity of the different tools in the domain events. So as we deprecate old domain nomenclature and try to move to new, that's all tracked. And we're working on making that visible through the tech radar. That's something that we're actively working on implementing now. Because the goal is to manage expectations and make sure that we are aware of communicating pain points and where in the life cycle these things are so that the engineers will understand if they will run into hiccups, if they will need to budget extra time for these things so they're more predictable and have a far more positive experience. And finally, what I would like to close on is thinking about the real reason change is hard. In Buddhist philosophy, there are three different categories of change, one of which is the suffering of change. I'm sorry, three different categories of suffering, one of which is the suffering of change. Why is that? We think things don't change. We cling to the way things are. We think we will never grow old. We think that our relationships will be constant. As engineers, we think that the code we build and we fall into that trap getting our ego associated with that, that we've brought this thing out. These ideas that we have are good and valuable and almost a part of us projected into the world. And we lose sight that it's all changing. But we grasp on to the way things we think are. Change is hard. We need to not forget that when we are asking our engineering teams, when we are asking people at work to change, how do we do that with compassion so that it really doesn't become suffering? So, thank you all. Questions? If we have time, I don't know. Yeah, we have. I think we have time for two questions. I see, you're the first. After lunch, exercise, that's great. Hi, so we talk a lot about psychological safety and whatnot. How do you approach, I don't wanna use the word enforce, but encouraging the quality of catalog entities. And so what I mean by that is there's the bare minimum that you need to register something in backstage, but then you may decide that certain plug-in, that the entities aren't as valuable as they could be without certain plug-ins implemented. Like, Tectox is a good example of that. And so how do you encourage teams to implement those things? Do you just implement a catalog processor that makes that a requirement? Or if you could speak to that at all? Yeah, I wish I had a good answer. And part of me goes to public shame, but that probably isn't the best option. Through our Rode implementation does have a really nice catalog checker and whatnot to ensure that some of the properties are there. But our catalog entries are actually baked into the templates, so when we create a new template, we also have all the properties that we would expect for minimal acceptance in there. And what we are increasingly doing is highlighting how teams who are using these tools are having more effectiveness and less pain points. And so then there is some level of the, ooh, cool, I want this feature. Oh, what did you need to do to do that? But yeah, I think that is one of the things that is challenging, especially as the template is there and now you want to roll out new additional features into those. That process, I am hoping that I learn some tricks and tips on the Tech Insights presentation later this afternoon because I think about the old Spotify video from a couple of years ago where they talk about showing how you can visualize 80% of your services have been moved off of this old version or whatever that I'm hoping that that's sort of something that we can roll out early next year, yeah. So perhaps following on from the gentleman's question. Your slide showed some architectures that are tigers and some that are your cat. What is the role of backstage in tiger-like architectures? It pretty much prevents it because the reason why tiger-like architectures wind up coming about is generally because teams think that the barrier of creating a cat-like architecture is too hard, at least in my experience. The other thing would be the knowledge gap of working up with the boundaries of a context and those sort of things. And that's where we've tried to create an intuition so that if by pre-loading a whole bunch of domains, this is kind of work in progress, actually, we're hoping that that's going to be able to create some boundaries so that it's more intuitive to not accidentally say I'm just gonna go ahead and cram something in another context that is going to wind up creating architectural complexities or challenges because a lot of that starts to come up when good intentions created a context but then you really need to start thinking about a new domain and or starting thinking about the type of sequence diagrams. Actually, I mean a lot of it does come down to sequence diagrams but that way you can understand how the data is flowing through the system and I think the other factor being the graph views are very helpful to understand the different interactions throughout the system. Thank you. Cool.