 Hello, everyone. Thank you for joining our session. We are so excited to be in Paris, to be at KubeCon and to be speakers today. Personally, it's my first time to be a speaker in such a big conference at KubeCon. That's why I'm slightly nervous, but still excited. And today, thank you. Thank you. And today, we will contact and talk about the cell-based architecture or the CBA. And we will first talk about what is the CBA. Then we will share with you what were our business drivers for using the cell-based architecture. Then we will move forward with the case study where we will present our solution architecture there. And in the end, we will share with you loanings which we got by working with the CBA. Before we proceed, let us introduce ourselves. My name is Rosti and I'm a cloud solutions architect from booking.com, mostly working with AWS and Kubernetes. Everything is, I think, in the speaker agenda. I mentioned as nine at the best certified, but I've become the full at the best certified recently. So I just like cloud and cloud native. And now, should I take an interest yourself? Thank you, Rosti. Good afternoon, everyone. How are you all doing? Great. So as Rosti said, we both are solution architect from booking.com. So first question, I would like to ask how many of you love to travel? A raise of hand? Almost many of us. That's great. Because booking.com, if you know, we have a mission to make it easier for everyone to experience the world. And with that mission in mind, we are very selective about our technologies and working towards it. And of course, open source technologies have a bigger role to play in that. Apart from that, I'm more interested in security talks. So that's why this talk as well around security. But we can catch up later on that if anything you want to introduce to us. With that said, let's dive into our topic for today. This is about cell-based architecture. Cell-based architecture is essentially a decentralized architecture reference, which was first introduced by WSO2, open source group. And since then, it really picked up well and adopted by us as well. Before we see the details of cell-based architecture, I would like to delve on the point that why cell-based architecture? We at some point between 2014 to 2020, we said that monoliths are not good. They are centralized, though they are good in some sense. But we want to move away from monoliths. And that's where we came more towards microservices and containers and container orchestration world. But when we moved away, we moved away from one problem, but we moved away towards many such smaller problems. So how do we deal with this? And to complicate the problem, we have more and more cloud services, everything has scored, and many more things. In order to handle that, there were certain architecture patterns which were introduced, like service mesh architecture, which deals with, it doesn't deal with the system architecture as a whole, but service mesh deals with how services talk to each other, how such granular services can streamline and can talk to each other. This talk is not about that, but if you're interested in knowing it, this is the barcode where you can go through a detailed case study about it. And we also introduced domain-driven architectures, where it says that how do you design your products, your services around your domain, your specific need. And also this paper introduced the cell-based architecture, which is the talk for today. Cell-based architecture in a way also makes the domain-driven design a subset of it, and we'll talk about it. Now let's look into the cell, because the entire premises around it is that what is cell-based architecture. And the smallest unit, like any biological cell, we say that biological cell is a smallest unit, which contains, for example, in this case, it contains some chromosomal DNA, it contains membrane, it contains its outer layer, which protects itself and then a layer which makes it communicate with the outer world. Now what is peculiar about this cell is that it is, it can independently create, produce, consume its own metabolism and can secure its local structure. So what it is, it is a self-contained unit on its own, it is one unit, it is always considered as one unit, it is independently scalable and it is isolated, so it can maintain its own state without necessarily sharing it with the outer world. Now with that said, we relate it to software cell or the architectural cell, where the cell can be comprised of many components within it. Now those components can vary, it may be just one service, microservice or if they can be many microservices around with all that is needed to make your workload complete. So that's what makes it cell and cells also contain its boundary, like it's shown in the outer layer of it, the cell boundary or the interfacing layer, the way it talks to the outside world. But with this cell also, we like it to be treated as self-contained, one unit independently scalable and having its local state and control. With that introduced, let's see what can go inside a cell in a software architecture cell, like in biological cells, cells can be different, for example a plant can have a different cell contents or body and similarly a human cell or animal cell, those can be different, so cells can be different for different people, for different enterprises, so a typically a cell component can be your compute with services, it can be data storage which is required to maintain the state of your services, it can also contain its configuration and of course security infrastructure around it, how do you segregate and separate out the internals from the outside of it and its essential life cycle management like logging and monitoring DevOps and infrastructure as code. Now, the beauty of cell comes out when you see it from different perspectives, now for our experience that we have seen that it can be a technology driven cell, those teams who are more cloud and cloud native and like to experiment lot with the newer features of it, they like to see cells as more technology driven cells, where for example one of your team has a cell with more serverless architecture, serverless components and it may be written in Java or may be written in Go or other team, they might like Python and they like to do more predictable workloads which is like Kubernetes kind of clusters and things running, so they can have their own state and that's where it gives them the ways and means to do it in their own way. You can have domain driven definition to these cells, for example within booking as well we have both these flavors, there are teams who are using it for technological ways and means more and there are teams which are using it for domain driven where product teams like to have their products categorized as one cell and that's how they like to develop and scale their work. There is one more thing about cell sizing that one cell can be just one service, one cell can be many services inside it, a cell can have one instance of it but in more cloud native sense it can have same cell can have multiple instances of it and we'll see it in the case study as well. Now another important part of cell is the cell communication that that is very important because that's what keeps it not just the internals of one cell but it keeps it isolated from the outer world as well and that's where we say that you need to have your central gateway which is which keeps it disconnected from the outer world and it should have a strict API contract where each cell exposes very minimal to the outside world and it sticks to those API contracts and on other hand you can also have your local cell gateways where they make sure that if there are any internal communications have to happen which is may not be exposed outside that can be taken care. So it's a kind of local control plane versus the global control plane which is essentially for the networking and there is another pattern which we are using is that if you have cells which need not go outside to the world need not be exposed outside then you can also expose it through the event and messaging systems and through the internal gateways so that's another pattern to it. I'll now request Rosti to walk us through that with this cell-based architecture what were our specific business drivers which made us adopt it. Thank you Shweta. So before we move forward to the use case I'd like to share with you what were our business drivers behind the cell-based architecture. First of all as also to mention before within the past five to ten years a lot of companies moved from the monolithic applications to microservices. It was our journey also so after migrating our services and establishing the microservice ecosystem we faced a bunch of challenges and the major challenges for us were first of all as you can imagine for sure it was the dependency hell. In booking as you can imagine we have a quite big number of services it's really quite big that's why it's a complicated topic how to control and track all the dependencies we have and also ensure that we like we communicating from one service to another properly really complicated topic. So another challenge for us was we also moved from single tenancy to multi-tenancy model and it brought just extra challenges because as you can imagine before that like we had to deal with a single contract but now we needed to deal with multiple contracts like one tenant and have one set of requirements another tenant can have another set of requirements and we need to do something in order to satisfy those requirements and try to do something it was it's a complicated topic again but it just brought more challenges on top of our dependency hell and even more another major challenge was security because again security is a quite important topic in booking.com and we need to comply with multiple internal and external security standards and just by seeing this picture of the dependency hell you can imagine I'm just looking into it right now I'm trying to imagine okay I need to build some security boundaries somewhere but where should I put it and moreover okay it's complicated it's okay it's feasible we could shape something we could just draw basically a shape in this picture but then the question is okay but what if I need to comply with more than one security standard it becomes really much more complicated because you see it's really a really complicated topic but we we need to deal with that anyway so that's why we this is where we started we realized that the cell best architecture can be good fit to address those challenges and on top of that we had more drivers which were first of all we what we wanted to do because in we also engage our project teams to project teams to leverage cloud native and open source technologies and be more flexible in choosing of a particular technology which is a good fit to a particular use case and we wanted to incorporate it somehow into our let's say okay okay design then then we wanted to again to have this capability of being able to compliant with more than one security and compliance standards in parallel and also provided somehow by design and for sure we would like to address some quality attributes again by design and all this all the things you see are about how we could help our developer our product teams to be more focused on business logic and on development stuff instead of dealing a lot of things with the infrastructure itself so that's why after trying the cba we realized that by with cba how we can address all those needs you can see the benefits listed here so first of all we introduced two dimensions for our cell the first dimension is which for sure is domain or subdomain driven design where basically we could just introduce a cell for a specific let's say subdomain and this is where we could just put the services inside the cell which represent the subdomain and it provides us a good balance with the product teams because until the cell contract is not broken then actually it's not so important what is running inside and that's why developers from product teams can use different technologies they can experiment with cloud native with open source bring more tools which they like to test just to put it inside the cell and it would be fine also another dimension is a security driven design which is our way to address that if for instance the bottom line is to address for sure security and compliance where basically a cell represents just a particular standard or set of standards and you see by having this boundary and by having that cell gateway and internal gateways which rather mentioned this is our way to enforce those controls and have a guarantee that no one breaks those security boundaries and we are still compliant by having multiple components multiple services inside a cell and accept for sure accept the security standard also a cell can represent for instance workload specific to a particular tenant so this is how also way not about the pure security requirements but also about particular tenant requirements and another thing was scalability because what we also learned is we can just scale our workloads just by creating multiple cell instances so that's why we could just like can spin up another cell instance of the same workload and to distribute the incoming traffic between them the only thing which is needed and it will be present and during the showcase scenario is basically we just need to have a routing layer on top just to incorporate new cells which have been created and distribute the incoming traffic among them and now I hand it over to Shweta to present our use case architecture. Thank you Rosti. Now time is to look into the technical implementation we'll double down or zoom into the one of the glimpse of how this implementation has been done for one of our domain. This is the reference architecture for the payment domain where it's not I mean I will go to the technical architecture of it but how this represents is that you have policy cell sorry payment cell on the left mode side which is the cell one labelled and this cell is exposing its services and it's through global gateway which is going through this secure traffic routing layer and that's the only route where it is supposed to be exposing the services the multiple set of these layers are showing that that this being critical service this is this is multi instances of those cells which we are talking about and it talks over secured TLS layer to the policy management cell which is the cell two and policy management cell though it is important but not as critical as the payment cell so it is shown with a single layer here that how it has its one instance but then there are deployable means which makes sure that if at all anything happens with this cell then they bring up another cell quickly and there are other smaller cells as shown here that the cell three which is more to indicate the internal communications with your shared services or your other cells which might be lying in the system now what is how does it look like inside now if you look at this payment cell it has az one cell instance az two az three now this is again to isolate because this is regional it can be cell instances can be many in this particular payment domain but it cannot be exposed outside to the region so that's where we have replication in az level each cell if we go now inside that cell each cell in this case because this is to do with more predictable workload we have Kubernetes service running there and that's what is used for the compute for various microservices we have one thing to note in this is that you have routable and non-routable subnets the workload the actual workload resides in the non-routable subnet where you have all the implementations and and your key stuff lying there and it talks to the outside world through this non-routable layer which is exposed outside and there is there are a lot of other security services which is very difficult to show in one slide but in order to tell you in that shell we have SPI for identities management we use wall for the secrets and configuration we use a passport for application based access control and we have other guardrails like opa and other things so with all those security tightened inside outside it is through transit gateway as is shown in this particular case if you're interested in knowing about more of spiffy identities and how identities get managed this is another talk which is PKI and certificate management which you can refer to later another cell to double down because this cell is different from other cell the the one which I've shown previously because this is more to do with the unpredictable workload so we are we are having serverless based services there for example we are using lambdas we are using ECS Fargate which makes the provisioning based on the need as it arrives and this is a single instance what we are using again from security perspective it has similar security norms and this is how this whole picture comes together the block one and the block two they go well together both are exposing through transit gateway there's stuff outside to the world now in this diagram one thing I would like your notice on that this entire thing cell one two four and five all these four cells can go away and can come up so these are independently deployable except this networking layer which is a thin layer on top of that which is may not be in form of that cell so that is where we put our all firewall security to the outside world and also to mention this layer four and five because they give you the essential deployment feature which you need with those cells because cells need to be enhanced also they have their own life cycle their features are getting added deleted day by day so that's the layer which take care of your cell instances and their sub services how they get deployed at the same time you have a layer for exposing if any essential monitoring data you need to expose to the outside world so that's how the whole payment and policy domains come together and work together but there is much more to it than this we have some learnings and they cannot be better person than rosti who's been hand holding this these teams in terms of what can go wrong or if cell based architecture is for you or not for you so let's hear from him thank you shweta yes so what were our learnings after working with vcba so first of all we we ended up by setting an architecture governance process where we created a governance group of cloud product and edubulous architects and we developed an internal standard regarding the cba and engage our teams to create if they think that the cba is a good fit to them to create a proposal and share it with us so we can comment on that and advise them whether the cba is a good choice for this particular use case and check whether all the our internals are met there as well as for sure to engage more stakeholders if needed and to rise and address more more concerns it's really better to have such a process because sometimes it was in the beginning it was a bit chaotic let's say then then when actually you should consider cba so you should consider the cba if your challenges are security and compliance like our challenges and if you are looking for repeatability where you need to spin up new versions of your applications and you need to experiment more and test something more so on so forth then you can consider the cba then if your architecture looks like again like our architecture if you have a lot of services which talk to each other using different protocols and if you have big trust boundaries you should consider the cba also and if you foresee in your future either a big refactoring of your microservices or if you're planning a big migration to microservices if you are also looking to how to improve your reliability and if you foresee more granular trust requirements trust requirements yes you should consider the cba but please remember you should be careful with it and just assess it specifically to your company to your technologies to what exactly you are doing because sometimes it can be just an overkill that's it from us and you can see our QR codes here you can reach out to us in link it in and message us about your feedbacks or thoughts or comments there thank you and I think we have we have time for questions if any you have I think there's a mic here if you can please come or maybe we can give this I think we have a special microphone yeah there is one hello you can hear you okay you can hear me good um question about yourselves and your your payment example when you talk to the data layer and you're doing because you're going to have to partition your data unless you're having one dependency so I'm curious how you've done that so if you've got individual cells and you're saying the data belongs to a cell that data then just belongs to that one cell and is not shared with other cells so I'm just curious how you broke apart your data or partitioned your data and which is the right cell that owns that data yeah so the question is that how do we partition and utilize data for our cells two things which I would like to mention and rosti please add if you have more one is that in in this particular case payment because in some of our financial technology business has picked up cell based architecture for them security and compliance is very very important so yes we have cell based dbs where data is dedicated to those cells and there is very minimal sharing between cell instances because without that you cannot live I mean if at all any synchronization has to happen that happens apart from that in terms of security because we are the whole premises here we are talking about is security so we have the whole things taken care in terms of data encryptions and data moving in transit should be well encrypted that's why I was mentioning about all that pki infrastructure and all but that's how it is being done anything you want yeah also what you could do you could just create a special cell for data and just move your data there or database whatever and this is how you could address those needs also the only thing is for sure just to establish the communication properly and decide whether you need kind of some some routing or logic between them in order to deal with partitions and so on and so forth and thank you very much thank you thank you for the talk was very fun and I've got a question about cross cell incidence because if I understand correctly cells mostly map one to one onto teams and if the cells are polyglot cells which means that completely different text tags may be used within the cells a incident that touches on multiple different cells might be difficult to resolve does it backfire in any way or is it actually simpler than than what I expect I'm not sure I got your question correctly rosti can you can you paraphrase it for us I'll try if every cell has its own text tag and there is an incident that touches multiple different cells is it difficult to resolve that incidence because of the cell based intro or a cell based structure or not definitely there are incidents but and I get your point that when you have a common incident which touches upon two three cells how do you really diagnose that but I am not able to recall any such common incident because that's where the cell design has been that you are able to resolve it concentrated into one cell however to answer it offline maybe I would like to check with our incident teams that if they have firefighted it and they might have some in principle yes so we didn't have such an experience but the thing is because you see the idea the whole idea that a cell represents like you'll also to your like to reduce your blast radius and the idea is so if something happens it just happened within a single cell not within multiple cells and this is where we could usually the idea is you could just spin up another cell instance just tear down the existing one maybe it helps but this is what you at least you could try or by following this all the monitoring tools just deal with this incident so if in your situation if you see that multiple cells become broken most likely you need just to read it how you're what exactly you put into those cells because by design it shouldn't be in such a way when multiple cells are broken for sure it can happen but in our scenario usually our like incidents were just tied to the single cell but as Shweta mentioned yeah we could reach out to our internal team yeah to see it out for you to see if we had such incidents also but in general that would be a symptom of something being designed wrong with the cells themselves yes quite possible and I'm getting indication to get it closed we can discuss it offline thank you all thank you thank you