 Great to be here to see everybody. I think just bear with us as we get set up here. You can probably tell from this talk title that there's a product guy in the room. Used to be an engineer, so I had to really fancy this up and make sure I had something that was flashy. So bear with me. I'm truly an engineer, but I have a product title. The first shot, we did. All right, we're good. Kicking off day one with big bucks no wins. OK. Hi, everyone. My name's Tony Goslin. I'm a cloud engineer with the Ethos team at Adobe. I'm Joseph Seneville, and I'm with Adobe as well, principal product manager there. Used to work in a few different roles, part of an acquisition where we were like early Kubernetes adopters. At that time, we were probably a Calico user, and then eventually we've shifted over to Cilium. And also, we have a newly formed tab member group that has just started up. The CNCF has sponsored. I'm one of the tab members for the end users. I hope if anybody are end users as well, that you take a look at this. We're hoping that this is going to be a really great way to drive feedback into all the products across CNCF like Cilium. And then as well as, I'm with Sig release, a Sig release manager associate, how things get packaged and deliver like all the plumbing behind the scenes. I'm with that team as well. And always looking for individuals who want to get involved in that aspect of it as well. But just going back to what I was kind of mentioning here in the title, and I was really kind of thinking about the culture that we kind of come from at Adobe. At its heart, it's a company that is a design company. They have software products. But at the core, this is really what it is. It's also a company that's been around for quite a long time. I don't know if anybody was around in 1982, but that's when the two Johns formed the company. And they were like really early visionaries and very interesting individuals who really like similar to like the HP experience, that really garage idea, I want to spin something up. And I recently just was part of a 30-year anniversary of Acrobat and just kind of hearing some of the story around how they got started. They were really, it started from a, they wanted to create a product for the laser rider at that time, which was supporting PostScript. And one of them just really went with it, wrote a white paper, and then all of a sudden he created this vision, even a little bit of marketing. And then that became the product. So we've always had this kind of innovative ideas and really pushing things. And so even with an infrastructure perspective in the domain that we have, oftentimes it sometimes can feel like constraining because of some of the primitives we work in, especially when you think about the network. That's an area where, I mean, a lot of us who grew up in data centers, we know some of the challenges that come from it. So I love that we're kind of in this more newer world where we can actually start to higher level think of like what's possible, what could we possibly do so that we could ship great products, that we could ship things that we know our customers are gonna love and that our developers are able to really have the ability to really get their features out but then understand how things are happening when things don't always behave right or things scale. So it's kind of really the genesis of what I wanted to deliver this talk. But just to kind of get some backs up, I'm gonna give it back to Tony and I'm gonna let him kind of give you some a little bit about Ethos, which is our container platform. Thanks Joe. So yeah, Ethos is, it's not a product. It's an internal platform that we use to deliver a cloud native at Adobe. And the way that Ethos came about around 2016, 2017, we had developers at Adobe who were going through their cloud native journey much like everyone else had. They were trying to figure out how do I take my application? How do I use Docker to package it? How do I deliver it? How do I get metrics around it? How do I manage it? How does it work in this new kind of setup that's different from more traditional infrastructure? And a lot of those journeys kind of looks like this. Some mistakes were made along the way. The guy there in the back room fell down. That's actually Adobe security when he found out what some teams are actually doing with their applications. And the problem was, with a company like Adobe, Adobe's a big company. I've lost track with how many employees we have at this point. It's 20,000 or 30,000. It's a big global company. So we didn't have just a couple of teams doing this. We had a lot of teams doing this over and over and over again. So the genesis of Ethos was basically let's have an infrastructure team get together, solve these problems once, and then give this infrastructure to developers as a service so that they can actually use it and get back to actually developing applications. We actually started this in 2017. We originally made a big bet on Apache Mesos. We didn't go Kubernetes first. So we were one of the few Apache Mesos users. And the first iteration of Ethos was pretty opinionated. We made a lot of decisions for users. And it worked okay, but what we found was with such a diverse audience of users, we had a lot of users coming to us who either had no experience or a lot of experience. They either had one container where they were just saying, give me a pipeline and just host my container, make it happen, make the magic happen. Then we had other teams coming and saying, look, we've already stood up six of our own clusters. We need you to give us a cluster that's blessed by security and then we need you to kind of step out of the way so we can do our thing. Have our own pipeline, we have our own metrics, we have everything. And that's something we couldn't deliver the first time around. So about 2018, we kind of saw the light and we moved over to Kubernetes. And that's when we decided that we need to give a platform with more choice. We need to really be flexible for our audience and be able to deliver something where we can take those different users coming in and give them what they need, whether it's the novice or the user who just has one or two or three applications they need to host versus the team where their application is cloud native from day one and that's their entire infrastructure and they need to be geolocated and with data locality, they need to be in certain clouds with certain services and they really kind of have a high degree of complexity around what their implementation looks like. So today with Ethos, workloads kind of look like this. We do a lot of single tenant and multi-tenant. So for our smaller users, we might put a number of those users on the same cluster and we need to be able to secure those users and segment them apart from each other. We have applications that are both cloud native and legacy, I'm using my big air quotes here. It's the lift and shift that is kind of the dirty secret of cloud native. We have a lot of applications that are coming in that existed long before Kubernetes. So those applications needed to come in, they were kind of wrapped in a nice Kubernetes wrapping and then deployed so we need to be able to support those and offer tooling that would support those as well. We have untrusted workloads. We actually have one product in particular called Adobe Experience Manager and the fun thing about this is that this actually allows Adobe customers to deploy their own applications on Adobe infrastructure. So we again, needed a way to be both performant and secure with applications that we really didn't have any insight into prior to them being deployed. So we need to be really kind of focused on how we package those applications and how we presented infrastructure to them. We do staple workloads. We have a very large messaging pipeline and metrics installation that we use internally. And we also have a lot of kind of machine learning and inference workloads as well as more recently as with a lot of companies, we have a large generative AI presence which has been on Ethos since day one and cloud native since day one. Yeah, I think you're kind of like, I would say that's probably the big elephant in the room of like learning. And I think with all these things that you know, these types of workloads and I think probably a lot of you are using Selium have probably gone through different journeys with like the types of workloads. The generative AI has definitely pushed us into a different kind of like challenges. And I think across the community we're gonna probably talk about that a lot more. Selium, it held up. Definitely we were pushing a lot of packages per second. We're pushing a lot of different artifacts to a lot of complex kind of pipelines. And we once again, it's always kind of those things in my career I always feel like, all right, what new thing am I gonna learn today? But ideally, I like things to just be like, just work. But we did, we did found some areas where we, there was some areas of like, that we had some few outages, a few things that were happening along the line. So looking forward to kind of seeing Selium kind of get more features in there. I think also upstream as well with like scheduling and orchestration. There's a little bit more than just the underlying CNI. I think it's kind of like the collective pattern of how we do this. So I'm really excited. And if anybody's doing these things I'd love to talk more about this because we definitely have some things where we feel like we want to improve on. But amazingly, we shipped major product this year, Firefly probably pushed our limits. And even in our journey with like using Selium, I mean, we've gone from being initially the largest customer, as Dan mentioned earlier and dipped the backup. And we really love that it's been able to kind of scale. And I think we're gonna talk a little bit more about some of those kind of nuances of what that is. Yeah, all in all, our fleet of clusters were running, I think today it's about 23,000 nodes just to give you an idea of scale. And that's spread across a lot of different tooling. Just to give you an idea, I was playing this slide together and I looked at this morning and I was like, this is starting to approach the cloud native landscape slide. The slide you see where the type is like six point you can barely read what's going on. It's not quite there, but we, I really tried to shove in a lot of logos here and we didn't even get all of them. But really what I was trying to illustrate here is we're giving a lot of choice to our tenants. We're on Amazon and Azure and our own private data centers. And even within the public clouds, we're both EC2 based and EKS. We're Azure VM and we're AKS. We even do some open shift. We support virtualization with Coddic containers and Kubevert. We support different container run times with Cryo Container D. We support both metrics as a service and bring your own metrics. We even support Windows as well as Linux nodes. And of course we support a number of different architectures with Intel and ARM and then a lot of different variants of GPU. Anywhere we can get GPU, we are looking to support it. So we support a lot of different options here and when you look at an ethos cluster, we often hear like, well, what makes an ethos cluster? What's your kind of, if you had to include just a few select tools, what makes an ethos? And it really comes down to two things for us. One is Kubernetes, that's kind of obvious. It needs to be Kubernetes. The other is Cilium. Without Cilium, a lot of how we present the infrastructure and secure it doesn't work. It's to a point where we basically insist upon Cilium as a feature with any new cloud that we're going to. When we looked at AKS, we said, this is great. When can we get Cilium? When we look at OpenSheff this way. This is great. Let's do Cilium on this. And the reason for that is simple. When we chose Cilium as our CNI, we were, and by the way, we were first consuming it as open source when Joe says we're a customer. We've only really been an enterprise customer. Thank the last two years. We've been Cilium fans for a long time. And for the reason for that is, number one, it's EPPF-based. That really gives a lot of interesting opportunities for integration, for insight into how traffic flows throughout the network, which is really important for us as infrastructure operators. We need to be able to have that visibility to be able to work with our tenants to show them how their applications are working or not working. The other was that it was very simple to pick up and very extensible for our tenants. It wasn't just a product that we use. It's a product that our developers use and that our sister or like security use in order to kind of meet their ads. So, you know, we kind of shopped around for a CNI. We were really excited about Cilium and we've basically been using Cilium since about 2018 and using it globally throughout ethos. It's one of the few constants we have in kind of our cloud-native infrastructure. Joe, I think you have a hot take. I have a hot take. I always have. I think I've done this. Hold on one second. I just wanted to qualify one thing there. You know, when Tony was talking about that, like one of the challenges too as well is like is because we are multi-cloud, hybrid cloud, like we're consuming this in a lot of different ways and I know maybe a lot of you are probably similar. Like, you know, you're probably, maybe you're in EKS and previous to that, we were running like our control plane. So we kind of are owning that part of it. We recently just had a partnership that has introduced the OpenShift and you know, obviously OpenShift has a very bespoke type CNI that's in there but we've made such an investment and I think that's where we keep further going with a lot of our doing, like instrumenting things. Like we have this kind of like domain knowledge and that consistency and this kind of leads me to my hot take. Thomas, you might like my hot take here. Celium is boring as a CNI. It is boring. I love it. But another hot take. Boring is good. Boring is good. Don't listen to the product guy. Don't listen to him. Listen to the operator. Boring is good and I'll tell you why. When there's an issue, what gets blamed first? Anyone? Anyone? Oh, it's a network. Oh, it's a network. Oh, my application doesn't work. It's a network. This is the battle cry for developers and it's something that we deal with in operations on a daily basis. Something's wrong. I think there's something wrong with Celium. I think we have to look at that. So when you choose a CNI, you really need to choose something that has great visibility and is stable. We really need to have a product that we can work with that's easy to debug. It's easy to configure. It's easy to point and look. This is how the traffic's flowing. This is how it works. And this is one of the reasons why it's one of the few pieces of the infrastructure that when we went to EKS, we didn't go with AWS CNI. When we went to OpenShift, we didn't go with Red Hat CNI. We insisted on Celium because without network, obviously none of this other stuff works. All of it is window dressing on top of the network. The network is the most important thing. So it's really important that you invest that domain knowledge into a product and then you use the same one everywhere. So as an operator, I really like the boringness that when we get out of it's more silly. It's more things that I know and that I can configure. Our end users like that as well, in terms of being able to use Celium to, again, configure their own applications, being able to approach Celium and configure their own L3 and L4 and L7 policies without having to learn yet another tool or use some sort of bespoke format for being able to write those policies. It's all JSON and YAML. These are all cloud native objects. It's UNIX, I know this. This is something where it's very easy for end users to come up to speed with, man, I just need to connect these two applications. How can I do it? Being able to pick up Celium is, we have some end user orgs that, frankly, are probably maybe even a little more knowledgeable in Celium than we are. And that's a good thing. We don't wanna be the gatekeepers of Celium. We wanna have a best of breed tool that basically meets our needs and our goals. And Celium does that for us. It's also very useful as part of our security foundation. I'd say probably more recently, gatekeepers has caught up here, but in the beginning, we were doing just a lot of basic workloads, segmentation, policy enforcement with Celium. And again, it's that kind of ease of being able to implement and understand how traffic flows and then allowing the developer to extend upon that, that really made it kind of the logical choice for us. I cover what I wanted. Yes, I think I did. You did. I did. And so really, what I really kind of wanna get more is like, and this is kind of where, when I'm talking to our overall platform team, our leaders, talking to teams that sometimes are actually struggling on this clouding thing. And I use this image only because we both came from an acquisition where we were in the original Pixar building where this film was made. And Pixar was kind enough to leave us some artifacts behind that to me are always like very inspirational. I love Toy Story, I love Buzz Lightyear. And I just wanna imagine that when I think of him that he probably had like Tetragon and he has all, he's has Celium and he has a mesh and he has all this cool stuff in his spaceship. But I think really about like, okay, what is possible? And this is where I kind of start to trickle down because we're going through a lot of change. Like we're seeing it, my roadmap was disrupted this year and I'd be all of a sudden thought I was doing something at the beginning of the year and now I'm going this direction. And I'm looking at things like Hubble. And when I talked about earlier, the CNI being boring, well, it's because now I'm able to look at more higher level things. So could I extend out? And we had this conversation with the Celium team a while back or Isabelant around like some of the things I'd like to see, you know? Like, could I start to see with the EBPF congestion? I wanna see like all the inner things so that before we get to these inflection points of failure, could I extend these things down so that we're getting less of those type of like toil questions? Can we push these things? And those are verbally in insights as well as like just the security aspect of things. Can we start to use these more? And I think this really when I was writing this up and thinking about it, I was really a memo to myself to say, hey, you're not exploiting the platform as much because that's really what's evolving here is it used to be just the CNI. Now, all of a sudden we have this platform that we could start to build features into our developer experience. And so constantly challenging where this is gonna take us, especially as I'm wrestling with like things like WebAssembly, we have our first team rolling out, you know, our confident authenticity is based on this. How do I support these teams? And obviously this is growing right with us. We're learning things that we just don't know what we don't know. And then the edge is just becoming more and more of a thing. And I think lastly, like the biggest thing, which I'm actually really excited and maybe you're also hearing this as well. This is just coming from an internal product person, but all of a sudden now I'm getting a lot of questions around supply and chain security. Like I would, I'd bring these things up on my roadmap, but no one would jump on it. And now it's like, hey, we need MTLS between us and this specific tenant or customer. Or we need to know like, how are you building these artifacts? What are the secure things that you have in there? I love hearing it because as I look at this platform where I see it's going or I imagine it's going is to a zero trust architecture. So when I see the vision of where these things are lining up, that's where I want to be. And that's where I really try to push my organization to be like, we made this investment. And I think lastly, you can hit my last slide. There's two other things that I just cannot emphasize enough. I love this community. This community is amazing. I've learned so much in regards to how we implement some of the challenges that we had. And I think being able to contribute to these companies that are investing in this space is very valuable. And I look at the individuals who are in it, who are just amazing and it's a great community. And I think that's what's really critical, especially now as we've seen some companies where licensing change happens. And I think I was looking briefly at like the Isabelan page and one of the things they happen about open source is that they, we think innovate and breathe open source and we are fully committed to the principles and values. And I'm gonna hold you to that because I really believe in that statement. I love it. So I'm really excited that we're able to kind of be here. We just gave you a brief glimpse of where we're going. I think in a year it'll be exciting because it will really start to ship a lot more of our Genitive AI apps. We'll probably learn new lessons. And as I mentioned, being part of the community means we bring these things back. We create issues. We really help elevate each other so that we can actually all grow together on this platform. And final thoughts? Oh, I think that's it. All right. I don't know if we can end this. Thank you. Amazing. So yeah, we have a fair bit of time. So I'm hoping there might be some Q&A. If you do have a question, there's a mic in the middle of the room. If you are able to go up to that, that would be amazing. I hope you don't mind taking some Q&A. No, we got time. Absolutely. Good morning. Good morning. My name is Dotsonath Midoi and thanks for sharing your experience with Adobe. I just have two questions, actually. You showed us a picture of Ethos, your platform. With that size, you must have had some unique problems, especially with Celia. We can out into Celia. And the two things that bother us a little bit. One is, have you ever run into Royal of IP addresses? And two is DLS, DLS resolution problems. My question is, can you just share one or two things that you're running to that was coming across all your platforms and maybe guide us into how you resolve those? Sure. So the first one, well, I'll take the second one first. Did you ask if we ever had DNS problems? Yeah. Yes. Yes, 100%. Absolutely, every day of our lives. Yes. Yes. And we've gone through multiple, like we had somewhere, and this just predates myself. And we've done things like DNS mask. We've also hit the length. It was like some of the challenges. Some of it was us as well, like not really realizing it. We had some alpine container issues as well. So we had to kind of really work through that. But we've kind of like to have internally, have like kind of an internal, like it's a DNS mask right now. And then we're revolving towards kind of like, as we move towards like mesh and stuff, we'll be leveraging that for more DNS. Not saying that that's going to make our DNS problems go away, but we're hoping that that's kind of like the pathway for us to have experienced less of those things. Great lead into my question. I was going to ask you, have you had any experience with service mesh like Istio, VS, Sillium, and how do you handle the policies? Yes, so as a platform, we've had to think about this one quite a bit. And as I think Tony mentioned earlier, we also are like a product of acquisition. So we have one of our solutions, which is, you know, they're using Istio and they're using, you know, like they have certain requirements that require them to use it. And right now we've just started to adjust it. And then as far as like our multi-tenancy aspect of it, like there's certain elements of mesh that we have in there, but not Istio, meaning like we have like MTLS in there. Now where I'm going with it though, is that direction. Like, and I think what's interesting is that as you see like ambient mesh emerge, like there's elements of Istio or Sillium with, you know, the Istio, you could also go to the Sillium, so like the Isovalent route as well. And I think we're looking at both of them. So haven't fully implemented that across our multi-tenant part of the platform, but it is coming now. As I was mentioning earlier, I would say quite a bit of like user requests have come in where I have to like now actively implement that. So we're definitely looking at what we have from our existing Istio implementation. I think the team there is, you know, from an upgrade perspective, that's probably like where some of our challenges are at is like maintenance. And so we really leverage our partners to help us there. Let's say this team that's implementing Istio is probably the first team that's come to us with an actual user story around why they wanted to use Service Mesh. We've had teams that have come to us in the past and they're like, we really want Service Mesh. They're like, cool, tell us what it is. And I go, I don't know, I really want Service Mesh. I just want it in my soul. And this is the first team that's actually kind of challenged us with an actual user story around that. We do have like service to service calls that need to be optimized. The way the things I'm looking at as well is like upstream, like multi cluster service API. Like we want to be able to have that cluster. Let's break beyond those boundaries. And I think that's where our opportunity of what we see is not only from the policy perspective, but then what we get from the across cluster. Were there any gotchas that you had to watch out for? So far, I haven't, I mean, we just started like doing our initial POC. So I'll probably have a better, check in with me in a month and I'll probably have a better story for you. Now you mentioned Zero Trust. How does that work with Sillium? And how did you implement that? I mean, that's a broader, probably broader story. Like when I bring that out, it's kind of like, I always look at like, what's aspirational? Like where do I, where can I get us to? And I think I look at this like, gives somebody a security observability that it plays into it is where I'm looking at it. We have other areas where EBPF plays a role because like Tony mentioned, there's like untrusted kind of environments. And we want to have some of those insights. And so that's where we think like EBPF is the area where we can kind of like, track those in kernel type events that we want to be aware of. So that's me like, it's just like many, many things that we're stacking up to get to that overall goal. Are you using like partitioning techniques? That is one, that is one of the environments we have to. Yes, not everyone, not every environment. And how about zero day? Those would probably be more like my security team. So I'd have to, they probably like, yeah, tackle that. All right. Thank you. I had a question as well. You mentioned that you have developers who are writing their own layer three, four, and seven policies. Do you give them templates or guidance on how to do that? In some cases we do, in some cases we don't. We have an internal tool, or internal set of infrastructure, it's a development pipeline that we call Flex. And it's based on Helm and Argo CD. It's a way for tenants to come in and be able to kind of extend upon those Helm charts to be able to deploy their own applications. And part of that is they're able to use some of those templates to deploy their own Helm charts for network policy. But then again, we again have some, what we call platform as a service, tenants who come in and just say, I know what I'm doing, just give me the API, get out of my way. So yeah, yeah. Taking back on that question, how do you stop somebody from creating a node policy that allows everything? Do you have old policy in there, or what is stopping somebody from doing something nasty? What's our default policy? No, what is stopping from somebody making an allowed policy? I believe we have, I'm so, I know we have some gatekeeper controls in place that kind of restrict what users can do within their own namespace, and I believe we also have some. Are you guys using Coverno or OPA? OPA, yeah, yep.