 Hi, everyone, and thank you for joining us. It's a real pleasure to be here. We have to say first that this is our first time as speakers, so we are extremely nervous. And when we saw the amount of people that registered to our session, which is approximately the amount of services that we were able to migrate, we thought to ourselves, maybe we should have migrated less. But we hope you're going to enjoy this session. So this is a peanut butter. Peanut butter is a delicious spread. It's made out of ground peanuts, and it's full of protein. And this is a jelly. Jelly is a sweet, fruit-based spread that is full of vitamins. Now, these two spreads are well-known around the world. All of you probably know them. Kids love them and also adults. But when you combine them both on a piece of bread, you get the perfect sandwich, which has the perfect combination between sweet and salty. Now, you're probably all wondering why we're talking about food, right? That's not what we're here to talk about. So we, at AppSplier, did our own perfect sandwich. By integrating these two well-known technologies, Githops on one hand and Backstage on the other, we were able to migrate all of our services and would like to tell you how we did it. So hi, everyone. I'm Shachar. Next to me is Ran. We both work in the platform team in AppSplier. So if you're not familiar with AppSplier, we are the market lead in mobile distribution today. You can see our amazing numbers above me. We have around 14,000 customers around the world. And we own 65% of the global market share today. Within AppSplier, we have AppSplier Engineering, which has around 400 engineers today, which operate in autonomous squads. These engineers are responsible for over 1,000 microservices, which are handling over 3 million events per second. All these microservices are based on 250,000 cloud resources. Now within the platform engineering, we have the platform team. Platform team has around 50 engineers, which are divided into six groups. And because we consider platform to be a product, we also have product managers. All these people together are responsible for assisting 350 R&D engineers. Now it's important to know that we consider these engineers to be our customers. So we want to make sure that they can easily manage their resources. Let's see how we were able to evolve to the numbers you saw before. So AppSplier was founded in 2011. We started up as a small company. We had few developers and small amount of services that required also relatively small amount of infrastructure. With time, as our customer base got bigger and bigger, also did the requirements. So we hired more developers, and we implemented more services which required more and more infrastructure. With time, as it happened in every company, people live, other people join. And the dependency between our services got more and more complicated. We were in a constant race to win the market. And that led us to an effect that is called a hyper growth. These charts can emphasize how hyper growth will look like. You can see an effect that is called a hockey stick in most of the charts, which emphasize our exponential growth. But hey, bottom line, we won the market, right? But what also happened during this process? So first, we had no alignment. Teams were doing the same thing in just different ways, whether the way that they tested their services or the way that they deployed them. And next, we had manually managed resources. So in order to get the job done and to do it fast, developers created and managed their resources manually. So we had no audit about these resources. In case something happened, we didn't know how to bring them back into production and to do it fast. Unknown resource, dependencies, and ownership. Because we had a lot of teams and people moved between teams, we lost a lot of knowledge about our resources, knowledge such as who is the owner of a service or what are the dependencies between our resources? We also had an exploding budget. So teams wanted to be successful, and they wanted to reach their goals to do it fast. So they were using the resources without taking it into account how much they cost. Resources for deprecated service were still alive, and we didn't have any defined budget for our teams. And lastly, we had lack of technical documentation. Now, let's face it. Writing technical documentation is not our favorite task, right? So when you want to speed up the process, that is usually the first thing you tend to ignore. All these causes frustration from our customers, which are our developers. Or if I can summarize it in two words, it was a total chaos. It sounds like a mess, right? So we understood that we need to do better. We need to adjust to these exponential goals, and we need to come up with a solution that's going to mitigate all these issues that Chacha mentioned. So we started by defining our solution principles. First, we wanted our solution to be auditable. We needed a full audit on any change made to the system. How many times bug started because somebody manually changed some configuration which nobody knew about? Declareative. We want our developers to focus on the what and not on the how. Our motto as a platform team is to allow our developers to do non-trivial things in a trivial way. We need a single source of truth, a one-stop shop for all the operational information needed close to the developers' code. We want our solution to be community-driven. We didn't want to invent the wheel. We only wanted to use battle-tested technologies adapted by the community. Self-serve. We want our developers to be independent. We didn't want to become a bottleneck by having our developers coming to us to provision the resources. And lastly, visibility. We need to have a complete overview of our system. We need to understand which resources we have and what are their dependencies and ownerships. So we started researching. And we came across this new methodology called GitOps. Now, I know most of you know what GitOps is, but in two sentences, GitOps leverages Git as the single source of truth for managing the entire software development lifecycle. It emphasizes the use of declarative infrastructure as a code and automated deployment pipelines. Now, GitOps is a methodology. It's not a specific solution. But we've noticed that its core principles are actually very similar to the principles that we've wanted for our solution. Its declarative means that the entire state of the system needs to be described in a declarative way. Versioned and immutable. So the desired state of the system need to be versioned in Git. If we need to roll back a change, we can simply do a Git revert. And if a disaster happens, we should be able to get our system back to its desired state. Pulled automatically means that any new change made to the desired state should be automatically applied to our system. And lastly, continuously reconciled. So we declare the desired state. The GitOps system should always make sure that the actual state equals the desired state. If for some reason the actual state diverges, maybe because somebody manually changed some configuration or maybe a disaster happened, the GitOps system should be able to detect the drift and get the system back to its desired state. So with all these solution principles in mind, we started to design our solution. And our first choice was to use Git, but not just for the application code, but also for its deployment manifest and any infrastructure or resources it might need. By using Git, our solution will have a full audit on any new change made to it. Next, we wanted it to be declarative. We already had good experience using Terraform. Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. It allows our developers to write a declarative code which hides the complexity of managing the infrastructure behind the scenes. With Terraform, our developers can use our platform modules to deploy their code to production, they can use it to add service resources such as data dog monitors and page-to-duty policies, and they can use it to add infrastructure. They can also view their execution plans and understand which resources are going to be added, changed, or destroyed. Next, we wanted a single source of truth and we said we want the configuration to be close to the developer's code. So we created the AppSlier metafolder, the .af folder, which every Git repository is going to have. This metafolder is going to contain our desired state and it's going to contain all these Terraform files that we've talked about. Now, we want to make our solution self-serve and we have a metafolder in each of our Git repositories. We need to find a way to automatically apply changes and reconcile on this metafolder. So in order to do that, we've created our GitHub solution, which I'm going to show you in the next slide. We have developer A. Developer A made some code changes and declares a new service deployment plan. He commits these changes to the metafolder and once he does that, our GitHub solution kicks in, runs Terraform, and apply his changes. A new version of his service is now deployed to production. Then we have developer B. She declared some infrastructure requirement and some page duty policies. She commits these changes to the same metafolder and once again, our GitHub solution kicks in, runs Terraform, and apply her changes. Now, apparently, developer A pushed some bugs to production and developer C, who was on call, got a page duty call in the middle of the night. Now, developer C was tired and lazy. So instead of actually fixing the code, he decided to delete the entire page duty policy and go back to sleep. So luckily, he didn't delete the policy from Git and couple of minutes later, our GitHub solution detects the drift and get the page duty policy back. Now, unfortunately, we don't have time to get into the implementation details of our GitHub solution because this will take the entire rest of the session, but you are more than welcome to come and talk to us after. I can say that we've used Flux and we wrote some custom Kubernetes controllers using QBridder and overall, we've only used community-driven tool and our solution is now self-served. Okay, so now we know how we're going to manage our resources, right? But it feels like we still have some unanswered questions like where can I find all my team's services or who is the owner of a specific service or who depends on my resource, what will happen if I make a change in this resource who will be affected by it? For these questions, we still have no answers. So we need to find some kind of a solution, a solution that can help us visualize all of our resources in one place. We knew that this solution must be intuitive because we know from previous experiences that if it will not be intuitive, then our users will simply not adapt it. So we chose Backstage. Backstage is an open platform for building developer portals. It was originally developed in Spotify Engineering as an internal tool and later it was released as an open source to the community. Backstage is a non-opinniated platform so it doesn't tell us how to do things, it just gives us the ability to build our own platform with our own capabilities. And it has some core features that are important for us to understand. The first feature is an easy to use software catalog. This catalog manages all of our software in one location. It has a unique and uniform look and feel and all the resource knowledge is located in YAML files that looks very similar to a Kubernetes file so that it will help our developers to read it and manage it. This is an example of a software catalog. This image was taken from Backstage website. You can see that we have three resources here. I have the ability to search on my resource and I can also mark the ones that I'm interested in using the star option. And here is an example of the YAML file. You can see that it has a very similarity to a Kubernetes one as mentioned before, right? We have the annotation key. We can also add some labels key if we want. And the next feature is a documentation as code. So Backstage allows us to write the documentation easily by using markdown files. These files can be added as part of our feature. So once we develop a new feature, we can also add these markdown files which later will be automatically translated into Backstage. Of course, these files are maintainable and we can find all the documentation easily within Backstage. This is just an example of how the documentation will look like in Backstage. You can see that it has a really nice looking fill. Even though that behind the scene, this is still just a simple markdown file. The next feature is a powerful search. So we have to understand first that Backstage index all of our information automatically so that everything can be searched. If you really want to, you can customize your search so that only specific entities or properties within those entities will be found. This is just an example of how a search result will look like. You can see that each result will have its own tags according to the specific customization you did. The next feature is an automated software templates. Now, think of these templates as a get started guide. This guide can be created by a simple YAML file. This YAML file will have your fields and the steps that you want the guide to do. Later, once you create this YAML file, it will be automatically translated into a code. If you want, you can also customize this YAML file or the fields or the steps so that they will meet your specific needs. This is an example of how the templates will look like within Backstage. You can see that for each template, we can also add some additional information such as tags who is the owner or description so it will be easy to search and to understand the needs of each template. And this is how a simple form, a simple template will look like. Again, no code was needed, only a YAML file. And the last feature is a self-contained plugin. Think of a plugin as a functional building block. Each plugin allows us to expose a certain capability that we wish within Backstage. Each plugin has an isolated code so that the teams in your group can build their own plugins without affecting other plugins. If you also want, there is a marketplace within Backstage for open-source plugins developed by the community. This is just an example again on how a custom plugin can look like. You can see we have a gauge and several charts here. And if you want, again, you can go to the marketplace and find a specific plugin that was already developed. Now, knowing this knowledge, we decided in Apps Flyer that Backstage is going to be our one-stop shop for everything, for externalizing all of our platform capabilities. We also decided that each entity or each resource that is going to be added to our catalog should be first standardized and then it can be added to our catalog. Regarding the plugin development, we decided, again, following the information before to distribute the development within our platform teams so that it will increase our development process and also the accountability and the ownership of each team within our platform. So let's take a look and demonstrate how the architecture will look like in Apps Flyer. So first, we have Backstage itself. Backstage contains of two services. One is the client side, which is developed in React. And the other one is the server side, which is developed in Node. Then we have a dedicated database to keep all of our Backstage persistent data. Now, following the information before about the ML files, we decided to keep all of Backstage ML file in one dedicated repository within GitLab. And finally, we have the catalog management which is developed in Go. The purpose of this service is to allow the CRUD operations over our YAML files so that later they can be synced back into Backstage. Now, during the next slides, we're going to refer to these parts of the architecture as Backstage itself. And that actually covers all of our principle solution. Okay, so we developed a GitHub solution and a Backstage solution. Now, in order to start using those, we have a new requirement. We need all resources to be managed using Terraform and appear in Backstage. So just a quick recap. Every Git repository has a metafolder containing our desired state. All the developers' needs to do is push their Terraform files into this folder and then our GitHub solution can automatically apply changes and reconcile on it. So that actually covers the first part of the requirement, right? But we're still missing the second part which is that all resources should appear in Backstage. Now, before we understand how we did that, first let's see what type of resources we wanted to see. So here's just a several list of the types that we would like to see in Backstage. And because we consider the service to be our core resource, because all the other resources will be later connected to it, we decided that it just makes complete sense to start our migration process from another internal tool that we had to Backstage on the services. Let's see how the flow of the migration will look like. So first we created our own template using the templating capability we talked about before. That template will have all the relevant fields of the service. Later it will have a step that will send a post request to the catalog management with all the relevant information. Then the catalog management will read this information and create a dedicated Backstage YAML file and will save it in our repository. By doing that, Backstage will use its syncing capability to sync the new added YAML file back to our catalog. So we got our first requirement covered. But is it enough? On one hand, we have Backstage. Backstage is based on YAML files. On the other hand, we have GitOps. GitOps operates on Terraform files. But these two systems, they don't actually talk to each other. Let's look at this poor developer who simply wants to add a new Lambda to his code. In order to do that, he first needs to add a Lambda Terraform file to his repository and then our GitOps solution will create it for him. But it's not done yet because now he needs to go to Backstage, use the templates and create his Lambda in Backstage. If he edits his Lambda in one system, it won't be reflected in the other one. Obviously, this flow is broken because managing multiple states for a single resource is exactly what we wanted to avoid in the first place. So we need to adjust our initial requirement. We need all resources to be managed using Terraform, appear in Backstage and always sync. So if we create a resource using GitOps and Terraform, it should appear in Backstage. And if we create a resource using Backstage, it should result in a Terraform file. So on the left is the GitOps architecture. On the right is Backstage. Now going back to our migration process and according to our new requirement, when the developer wants to migrate his service using Backstage, it should result in a Terraform file. But what is a Terraform file for a service? So at first we thought that a service equals its deployment. We thought we can simply use the deployment manifest which are already written in Terraform and use those. But a service might have multiple deployments or maybe no deployments at all. So we realized that a service can be described even before it has a deployment and it can contain some metadata such as language, description, page or duty. So in order to support all of those, we created a module, a Terraform module for a service. Now how does it all work? So now when the developer wants to migrate his service, instead of Backstage sending a post request to the catalog management, Backstage creates a service Terraform file in the developer's repository. As usual, our GitHub solution runs Terraform and when it's done, and this is the part where the magic happens, we send the Terraform state to the catalog management which knows how to parse the Terraform state, extract the service definition and create a service YAML file which can then sync to Backstage. So that's great, but how does it look from the user perspective in Backstage? So the user will see three steps during the process. The first step will be the migrating of the service which will be responsible for creating the Terraform file within the user repository. Then, as Ron mentioned, GitHub kicks in, right, and the flow starts processing. So we're waiting until the new YAML file is created and saved in our repository. Then comes the last step, which is Backstage validation. In this step, we are waiting until Backstage sync again with the new YAML in to the catalog. Once this process is done, we have a fully managed service using Terraform and GitOps that is also visible within Backstage. A bit about our migration process. So we were able to migrate more than 1,000 services and the migration process involved 45 teams in our company and the entire process took us approximately five months. Our developers liked it. Most of them prefer to use Backstage nice UI and template to create resources over directly going to the repository and creating Terraform files. So they started to ask us to support more resources in Backstage. So at first they wanted us to support lambdas. It means that we have to go to Backstage and create a lambda template which creates a lambda Terraform file in the repository. Then we had to go to the catalog management and add a mapping that can parse the lambda Terraform state into a Backstage entity. Then they asked us to support an S3 bucket. So we had to do the same. We had to create a template in Backstage and add a new mapping in the catalog management. But then came the advanced developers. They already started to use Terraform and GitOps to provision resources which were not yet supported in Backstage. They complained that they are not able to see these resources in Backstage. The reason that they are not able to see them is because we don't know how to map their Terraform state into a Backstage entities. So at this point we realize that we need to show them something and we are not going to be able to map all possible types of Terraform resources into a Backstage. So until we do, we created a default mapping so any unknown resource that we don't know how to map will be mapped to a generic entity in Backstage. So in order to understand what is a generic entity, first we need to see how a non-generic one will look. So this is an example for a service. You can see that we have a very custom plugins that are specifically relevant for services, such as PagerDuty or we have some tabs for the CI CD or for the deployment. And unlike this, we have an example for a generic entity like NRDS. And here you can see we only have some shallow information like metadata. But what we did get is this. We got a full representation of a service. Now we know on which resources it depends on, who is the owner and to which system it relies on. So a bit about the process and what we learned from this entire process. First, a change is hard. Any change that you want to do in your organization with a hundred of developers will be difficult. People are used to doing things in their own certain way. And they tend to refuse to a change, even though it can help them. So always be prepared to answer all their questions and concerns. Developer experience. It is important to look at your developers always as a customer and try to make them enjoy your new process. By providing them with a good user experience, you can increase their trust in your new change. And lastly, today we talked about our solution in the company and how it helped us solve our own problem. You can decide if you want to use this one or you can choose one of your own. Which ever solution you choose, just make sure that you always manage your resources. So to sum up what we learned today, we started up with the total chaos. All of our resources were managed manually and we had no alignment. Then by integrating these two great technologies together, we are now able to say that we start to manage our resources automatically and efficiently. Thank you very much. So before the questions, we would really appreciate any feedback that you have during using this QR questions time. Yeah. Please go to the mics for questions. Sure. No. Oh, right now I am being heard. First of all, it's the best presentation I've seen till now in KubeCon. So congratulations, really amazing job. Thank you very much. With the platform. And my question is about the platform. Is it something you built only for your internal teams or is it something you're planning to open source or commercialize? Sorry, can you repeat that question? Is the whole platform, not only the backstage, but the whole platform which backstage is generally supporting and serving for the developers, is it something you have built only for your teams, specifically, or is it something which you are going to open source or commercialized? Because it's really amazing solutions I've seen today. So this sounds like a question for our VP. We did externalize some of our tools, they are open source, but specifically this solution is internally. I know that we've talked to some guys at Flux and other people from the open source community, but currently it's internal only. So maybe second one, just quick one. Have you considered buying such a tool instead of building it? And because you build it, and basically you probably considered to find something. Is it, the reason why you build it is because you haven't found anything or was it expensive or was it not for you, basically? To be completely honest, when we started to design the solution, it was all new, Github was new. The Flux just started. So we looked for existing solutions, but at the time we needed a specific solution custom to our needs and there wasn't anyone available. If today we had to redo it all over again, maybe we should have, maybe we would have chosen another solution. Thank you so much. Amazing core guys. Thank you. I have a question too. Can you hear me? Yeah. Thank you. Thank you for the presentation. I have a multi-part question. First part is that, how was the onboarding process within your teams? Did you start on a wide scale or did you start simpler? And how was it that you introduced this new tool to your developers in the beginning? So as mentioned before in the presentation, the change was hard, right? People were not really accepting the change. So we always had to talk to our first to talk to the teams. We had a video about this explaining the reason that we're doing it and how we're planning to assist about that. Also, we didn't force them at the beginning to do any of the change. We gave them a long due date and any concern that they had, we opened our own channel and we had us basically assisting them, whether it's in the channel or just private meetings. But bottom line, when you want to do the change, you have to set up also a due date. So it was just like an ongoing process. We were following how many services were migrated and once we had the due date arriving, we started a bit to tell them that you have to finish it. Let me add to that. Hi, excuse me. Let me add to that. At first, we created a tool that opened a merge request that creates the service telephone file in every gate repository. But this kind of beats the purpose because we wanted our developers to experience our new solutions and understand what's going on. So eventually, it took them a couple of months, but they did it anyway. Right, so your catalog management go tool, there was, when you started was there or you just, it came along when some of the teams have already migrated. Sorry, can you say it again? The catalog management that you, the application that you have, was it something that you rolled out from the beginning or you had this more of a mental process and later adapted to it? Okay, so the question was, was the catalog management something we created internally? Yeah, we did it from scratch. We didn't want anyone to actually edit directly the entities. This was like the single way to expose quad operations over the entities and then we have a full audit and control over it. I wouldn't take all the time, just a small question. How did you observe or gather the feedback from the migrating teams? Which metrics did you have for backstage that you can know how many teams migrated, how many services migrated and stuff like that? So the question was, how did we know the statistics about how many services and teams migrated or not? Well, we had a couple of ways, both from backstage and our catalog management. As we said, we had full observability on the service and backstage. We could tell you any number that you just mentioned. We were able to also compare between the other internal service, sorry, system that we had and to compare how much services we had there in comparison to how many services we have today. Also, I can say that during the process itself, at a certain point, we also blocked creating new services in the other internal system so that they will need to migrate their services. Thank you. Hello. Thank you for a really great presentation. Yeah. I have a question maybe not directly related to the content, but our company is going through a similar process, let's say from traditional CI to GitOps and what we noticed what's hard for developers is that previously in UI, they could set up different workflows. Like you can deploy first to dev, only then you can deploy to stage, prod, et cetera. And with GitOps, we kind of lost it. Like have you had this problem as well? Like how do you manage like workflows in GitOps in this system or it's not yet so? Sorry, we really couldn't, can you like maybe talk a little bit? Sure. So, oh yeah, really I was not standing near the mic. So the question maybe not directly related to the content of the presentation, but our company is going through the same process, migrating to GitOps like a lot of developers. And what we noticed is that in standard world before GitOps, you had let's say some UI with a pipeline and developers see, okay, I first need to deploy to devs and to stage and to prod. You have some green marks, et cetera. And with GitOps, we kind of lost it. Like do you have some kind of like solution for it? Like some policies like you need to deploy first to this environment that maybe some UI in backstage or somewhere like how the flow should look like. That's a great question. And it was one of our biggest pain points. So yeah, the developers used to work in pipelines and they could see whatever's going on. And once we switched to GitOps, everything was asking. They had no idea what's going on. So at first we, the GitOps solution of ours, it committed like we use GitLab and it marked a job. So if like, if the commit failed, like we would mark it as failed and we would comment on the actual merge request and tell them the reason. And in backstage and during the service migration as Shachar showed in the presentation, we actually like open the web socket and we show them the process of how long it's going to take GitOps and backstage to sync the new entity. And do you have some connections between, you know, like you need to first deploy to this environments and to another environment? Or like, if I want to deploy to product, can I do it right away? Or I need to, you know, like follow some workflow. So our GitOps solution is based on Telfon file which is completely agnostic to environments. If you want to deploy your code or service resource or whatever to a new environment, all you need to do is just use the right modules and we'll apply it for you. Okay, I got it, thank you. So yeah, I was wondering how you managed interdependencies between like shared resources or between services. So maybe it's like. We can't hear you, sorry. Okay. I was wondering how you managed shared dependencies between services like shared cluster or shared three-pack or something like that. Or if your services were really independent from each other. So I think that the biggest challenge in backstage is to know how to map resources in the catalog. Each resource can be mapped in several ways, right? For example, you can map a database as a one entity or maybe you want to map the table as an entity itself. So it really depends on how you want to map your entities. For us, I mean, again, as we explained here, the entities that we decided to map were added to the catalog and the ones we didn't were just added as a generic one. I'm not sure if that answers your question. Let me know. Okay, we need to clear the stage but we're going to stick around for a while. So you're more than welcome to come and talk to us. Thank you very much.