 Just a little bit more testing. Just making sure that the levels are reasonably balanced. Let's try this one a little bit. Yeah, those sound about right. Cool. Holding the microphone here, like I'm a fist away from your mouth, right in the sweet spot, pointed in. Because if you hold it out here a little bit further, you don't pick it up. If you hold it like this, it doesn't sound so good. But once and see if they're reasonably balanced across multiple people, why aren't you? Did that suddenly not work? Is it on? Well, yes, but it was just working. So there, that sounds a little bit better. And yours isn't working at all? Test one, two, test one, two, all right. Check, check, check, check. It's now check, check. This is powering up. Check, check, check, check, check. This is 26. This is the one you guys have been using. So this battery looks full. Oh, interesting. Check, check, check. Check, check, check. I'll have Phil be close by. All I would say is turn it on. But it is. But you didn't see anything. It seems to be OK now. We love intermittent electrical issues. Testing one, two, three. Hi, everybody. Welcome to our panel. This is, where do we go from here? We're talking about the future of the Linux desktop today. We have everybody but Pietro de Caprio from Vanilla OS. I'm going to go on and introduce everyone. We have Nova King from Stardust XR. That's a XR display server. We have Matthew Miller. You probably don't know him from Fedora. We have Matthew Miller from Fedora right here. Rosanna Yuen from GNOME. Layton Gray from Fear Labs. Geradams from KDE and OpenSUSA, and Monica Madden, the community expert. We're going to start today with, we're going to talk about the application ecosystem. So here's the first question. We're going to start with Nova. Many of the major improvements, many of the major applications coming to the Linux desktop are electron-based ports of existing applications. How might we encourage developers to write native applications and attract them to native platforms? Is this an issue in the first place? Turn on yours on. Mic check. Test, test. OK, cool. Yeah, so I think this is a legitimate problem. Electron apps, though, are really, really slow. They take up a lot of space. They're insecure in many ways because electron is often out of date. And honestly, though, the ports are just never as good as native apps, but it's really hard to make a native app in the first place. So I think really what needs to happen, though, is that distro designers, they need to be flexible about how things look because oftentimes people will be put off making a native app if it doesn't look exactly perfect. We need more toolkits and we need obviously more money and time and effort put in. And I think that the community needs a bit of a shift because some communities are leaning more too much towards specific design guidelines as opposed to the actual usability of an application. It doesn't matter if people can, it doesn't matter how good an application looks or how well-designed it is if people can't use it in the first place. So some leniency and some help and some money would really help in this case. Yeah, I think packaging things, this better? A little bit closer, yeah. Sorry, okay, there we are. Yeah, packaging things that are made for electron is really hard for a distro and there are all these problems of security and there's also kind of a monolithic ecosystem there that's always a risk. I don't know a great answer. I do think that people are going to want to use web technologies to deliver applications and I think we're not gonna win trying to fight that. I really would like to see some investment in things like Servo and whatever other alternative engines could exist that might be more lightweight, faster and maybe more of a community-maintained thing. It would also be nice if people would take care that their web applications, technology things don't require some particular version of electron to run well. It really, if you have that, it should run just fine on Firefox in some way and then making that work well for nice people nicely is probably gonna be better than fighting the, sorry, fighting the way people want to develop stuff because people are gonna do that. One gist. Hello, does that work? Okay, so I'm not as technical as these two guys but that being said, any app is a good app, right? What we need, we want more users, we want more people using Linux and part of that is creating more applications that people want to use. So if we can find a way that can somehow have everyone or more people use Linux and apps, it doesn't matter really at the end of the day what they're using. I mean, I understand they're technical issues, obviously but that means just another hurdle we have to solve so we just have to get there. Yeah, I similarly agree. I'd rather have electron apps even though they're buggy rather than no apps at all especially, oh yeah, sorry, especially for proprietary software like Discord and such, you're not gonna ever see a Linux native app for that. That being said though, what I would like to see is I think there are a lot of alternatives to electron that I think a lot of developers that are already using electron would probably be interested in if they were better support on Linux. Like for example, Microsoft has been putting a lot of work into React Native for desktop and that currently has support on Mac OS and Windows but not really on Linux. And I could genuinely see a lot of people using something like React Native for desktop on Linux, you know, especially because it can run on Mac OS and Windows because it does offer a similar developer experience to electron while also providing just, it just works a lot better and because it's in the end of the day it renders using native UI widgets. So I feel like more investment in other invest in electron but also invest in some other, you know, cross-platform UI toolkits and such. Yeah, I don't have a lot to add to all of those points. I think definitely we need more investment in, invest in tools that help us support the way people want to work. They already have skill sets that, that's why they're using electron is because they have a set of skills that they want to apply to this problem domain and that's the thing that gets them there faster. So having tools that support a similar style of work but gives them more native output would be better. I personally hate electron apps but I would much prefer an electron app than no app at all. I just can only run two or three of them at a time. That's how they work. So yeah, it's a combination of all these points and I don't think we can fight the momentum we have to catch that wave and transition it to something that, you know, build tools into Qt and, you know, in GTK or whatever that allow for these styles of work. I think that's the best path forward. Okay, and I know that I'm going to, I don't want to repeat a lot of great points but I know something that in the Ubuntu community that worked when I was there was when they made the switch to the Flutter installer was that this was kind of a small app that could be that was open to the community especially to the flavors and this could be kind of a really great point of entry and so again, I think this goes to issue a lot of our projects have of give people good starting points to develop apps, give them good documentation so for whatever tools that you're going to use and then just give them like good projects like, hey, we need this installer or we need like these widgets and just give them good sign points, good sign posts on what your distro needs and that's also incredibly helpful. Now we're going to do a segment where if you have a response to another panelist you can make that response now otherwise we can move on to the next topic. All right, yeah, so I've got several responses. So I think it was you or you who mentioned needing to have electron apps though and I really do understand though that it's incredibly popular and it's really easy to develop for but oftentimes though a broken Linux app can sometimes even be worse than an app not made for Linux. It can tarnish the reputation, it can confuse people and it can just be in more insecure though without letting the user know. So, but also on the other hand though having more apps is generally better. It gets more people to develop apps. It improves the quality of the apps. So it's a very difficult balancing act though but I have to 100% agree. If we make more starting points for apps though then more people can fail faster and they can build more apps to make more tools to build more apps, et cetera. And it's a positive feedback loop. So it's difficult all around though. I think that stuff like Flatpak really helps because it removes dependency issues. It's not a perfect solution, nothing is, but it's getting better over time. The UX from 2008 to now is massive though. You can actually put somebody who doesn't know Linux in front of a Linux machine. Do you have a response, Matthew? We're gonna be moving on. Send it back to Nova or does someone else have a response? No, send it back to Nova and we're gonna move on to the next topic now. Our next topic is sustainability and funding. The question is what issues do we face when it comes to the sustainability and mental health of contributors? What could we do to address the current faults in the development model such as maintain or burn out? Yeah, so one of the most difficult parts about burnout especially, and I have personal experience with this, burnout often comes from when you put a ton of effort into something and you don't see many emotional returns though. It feels like you are basically, constantly running up the engine and going nowhere. It's a literal tire burnout. And this can often come from a community that isn't really well engaged though or something that is highly abstract and needs to be explained in simpler terms or something that needs to be more accessible to everybody. I think that more resources to actually help the Linux desktop app creators and the actual people who develop Linux to get the more social and community side of things under control and to get it out there as well as different avenues to share projects that might be in an early development stage but have potential. I think those would help drastically though. Even one positive comment can stave off burnout. You have no idea just how much it means to say for somebody to say, wow, that looks cool. So, yeah. There's a lot in the question, can you repeat it? The question was what issues do we face when it comes to sustainability and mental health of contributors? What could we do to address the current faults in the development model such as maintain or burn out? Yeah, it's a really big topic. I, yeah. Sorry, I'm trying to avoid the plosives and make it loud at the same time. I think really there's a lot about recognizing what people bring to projects and making sure that they feel recognized and supported when people show up to do things and there's a lot, it's easy to take people for granted, especially when they're doing things so well. And you're like, okay, now I've got this one person who does this thing. We've had this pattern happen to us too many times in the Fedora project where we've got the docs teams working really well and then the docs team lead is doing all this amazing stuff and you just kind of, it's easy. There are so many things that need attention that when something is working well, it's easy to take for granted. Oh wow, yeah, that just goes like clockwork and then at some point where you're not paying attention that goes into, oh yeah, that person is feeling like they're off on their own and doing this all by themselves, nobody appreciates them. So making sure we, in all our projects, look at what people are doing and recognizing the people who are doing those little things and the big things across the project, I think, can help. Some of the greater sustainability things are really hard. I think, yeah, funding and funding for some of the sustaining work that's less fun is hard to get especially because Linux on the desktop, desktop operating systems are not themselves money makers so it's hard to attract that kind of funding but I was talking over at the Thunderbird booth earlier and wow, were they able to bring in a lot of money just by making a nice little request there and I think the thing that the Flut Hub people are working on about donations to developers and even paid apps there is an interesting thing. I'm interested to see how that goes. I would also like to see some of that be taken towards sustainability intentionally, both in terms of community sustainability and just making sure that dependencies get funded as well because when you make those top level asks and that money goes to the people making the flashy applications, all of those flashy applications stand on the shoulders of thousands of other projects. Everybody's seen the XKCD and so it would be nice if some of the higher level hubs and things that are looking at taking money would find ways to pass that on as well. So as someone who works for a nonprofit, I'd like, I mean, we've already established that funding is important but I also want to point out that burnout isn't just a maintainer or a developer thing. All volunteers burn out and I've been part of this community for way too many years. I don't want to count, but I mean, I know all y'all so. I've seen people burn out and it's really, really hard to get people motivated and stay around. Part of it is we need to just keep getting more volunteers because people leave for a variety of reasons. Some of it is burnout but some of it's, you know, they have better life things which happens which is good for them, great for them, happy for them, sad for us. So we need to keep getting more volunteers, keep getting more people contributing in all sorts of different ways and obviously money helps a lot. If we can fund some of these people who are helping out, then maybe they'll stay longer. Yeah, so. Yeah, so honestly it's just, I don't even know where to, yeah, I don't even know where to start here. I think I see burnout most in sort of, like sort of what Matthew said, in the dependencies of projects, right? Like I've seen, I'm not gonna name any projects specifically but I've seen, for example, just looking through the GNOME stack, there are a lot of like, you know, a lot of large crucial like C libraries that are basically maintained by the same person. Like we're talking about like 40 by the same person and you know, when you're sort of, when nobody, how do I say, when there's not really attention brought to like these crucial but you know, not really flashy components, right? What happens is that you end up with like one person, you know, doing the brunt of that work and they sort of put themselves in that way because nobody else wants to step up. And I think a large way we can address this is just sort of showcasing more of those core components rather than, you know, the flashy, you know, what the user immediately sees in terms of contribution at least. I think with burnout, a big issue is sort of of our own making. When you get excited about something, you sort of want to run with it as fast as you can. And I know for me, it's sometimes really hard to rein that back and say, okay, nope, nope. That will lead to burnout and I encourage other volunteers that I work with. Don't do that to yourself. Don't do that to yourself. Harness that excitement, but pace yourself. That's gonna be super important to give you the legs for the long-term and in the long-term, don't look to do it in silence or in a silo. Look for your deputies, your helpers. And have friendly invites. Always invite people to have fun with you. I, you know, I'm in the OpenSUSA project where we say all the time, have a lot of fun. And if you kind of think about what you're doing, if you're excited, you're having fun doing something in open source. Harness that fun and bring someone along for the ride. Let's have fun together. It's more interesting that way. So don't do it in a silo. Don't do it in silence. And don't try to bite off more than you can chew. Pace yourself and over that long duration bring people in. As far as financing of it all and how to direct that, I think there's interesting things we can do that haven't been explored in business models around open source. And I hope we'll see more innovation in that space over time. But yeah. And this is going off a previous comment, but besides giving people good, well-documented on-ramps to the project, I think that giving long-term contributors temporary off-ramps, it's like, okay, you had a kid. You got a huge new responsibility at your paying job. You have a health crisis. And if you have something in place for your project, where you're starting to put that knowledge and skill into processes, so it's not just that person. And when they leave everything good goes with them. And it's like, okay, we've got this. You've documented your work or we've set you down for a brain dump and we've documented your work. You go take care of this, whether it takes you weeks or months or years and come back whenever you want. And then that process can guide the next maybe one or two people who step in to take that person's place. And also, I think trying to set up formal or even informal mentorship programs in your projects. And so that way, those people before they burn out have someone they can talk to, even as they are starting to bring in the people who will become them in the next five to 10 years. Now is the time that if anyone has any responses to a previous question or a previous answer, you can respond now. Yeah, go ahead. I just wanted to add, you said something about mentorships. And this is something I've just recently brought up with some of people in GNOME and we haven't actually done any of this yet, but we're thinking about it. It's not just mentorships and how to onboard people. We're trying to think of a way to teach people how to be a mentor and how to onboard people. That's something that we haven't, obviously, I really want to think more on and maybe try and find some way to do that. Let's talk more about that. Yeah. I want to build on what people are saying about burnout and I guess burnout is something that happens when you have passion. If you have a day job that you go to and I'm building some widgets from nine to five or I'm working at this office and then what I care about, it's my hobbies in the evening and whatever, you're not going to get burned out at that job. You might get tired of it or bored or whatever, but burnout's not what you'll experience. But I think everybody here has passion for community and software built by community, open source and free software. There's something that really drives us and it is this passion and love and that has the flip side of pushing you at risk for burnout. One of the situations that can really happen is when you end up feeling like I'm the only one who can do this, if I stop doing this, it's all going to fall apart. And so I think that kind of goes to the seams of mentorship and support and inviting other people in and kind of setting up projects to make sure they're structured so that people don't feel alone and feel they can step back and that it's gonna be okay if I'm gone for the weekend, if I'm gone for a month, if I'm gone for five years, well, I raised a little baby. That should all be okay and you should know that it's gonna keep going. It doesn't all just depend on just you. I think that really helps buffer against at least one really big source of burnout. Yeah, I definitely have that problem. I mean, so the project I'm working on is Stardust XR. It's a three-dimensional user interface for the new AR and VR headsets that are coming out, right? And for Linux, sort of, yeah, the demo failed, but it's fine. But the thing is, though, is that in my case, there really aren't any other projects like mine. And I don't just want it to be my project, but a lot of the other XR desktops, as they're called, have kind of just, you know, they've fallen, they've been archived, they haven't been updated in years and it, you know, and so sometimes it does actually feel like, you know, you're one of the few people that are working on this project. And you know what would really help with that? Support services. Because right now, though, if you have a problem in open source, like, you know, some company is basically saying, hey, would you like to change your license and we'll donate money to you? I mean, who do you call in that scenario? There's no hotline for it. So, you know, having support services, having nonprofits that teach you, you know, or just have people that are there to at least talk to that have experience with these really tough problems, that would help immensely, though, because there are so many times where I've been in situations where, you know, I just don't have the experience, though, and I don't have time to get the experience because it's something that might be more immediate and I just don't know what to do. So I basically have to, like, have an anxiety attack, though, and then blast out the question to everyone I know, and eventually somebody will help me find an answer, but it's highly inefficient and it accelerates burnout like you wouldn't believe. So having support services. I'm really sorry to cut you off, but we do have to move on to the next question. We're running a long time on this one. The next question's gonna be really great for some of our non-technical people. Non-code contributions are often neglected. How can we better encourage these contributions, these kinds of contributions, like translation, design, non-code contributions? Yeah, so this is a particularly difficult one for me because I do a lot of code, and actually I did code in order to do design because I'm a UX designer, and the tools that were available were simply not good enough to do the UX design. So I had to learn deeper programming than I already knew in order to do that design. In my opinion, what would really, really help is we have designed sort of isolated in little corners. We have discussions about design though, that are in their own little walled off silo and such though, and we have designed teams that are in their own individual channels, but they're surrounded by a sea of developers, and if you're a designer and you want to be able to talk to teams that are very different from yourself, it's particularly difficult. In a lot of projects though, that are filled with developers though, when I'm putting on my designer hat, I mean, who do I talk to though? Because I may not know all the intricacies of the code and they may not know the intricacies of the design. So getting the two to talk together is particularly difficult. So I don't really know to be honest, but I figured I'd express some problems I've had. I could talk about this for like five hours, but I'll try not to. This is an incredibly important topic, and it's a weird thing at companies because I work for Red Hat, which has 20-something thousand employees, and of those, there are maybe 5,000, 6,000 in engineering, and in engineering, like probably half of those people are actually engineer technical jobs. So that's like what? A tenth of the company is the developers, but yet the company understands that that other, right, like those other people are actually important, and in open source projects, and especially when companies invest in open source projects, we tend to think, oh yeah, we got invested in this development part, and neglect that other, the 90% that you actually need to make an organization run. And open source projects, they're not like a company in every way, but those things you need around design, documentation, people support, and events, and planning, and just project management, and all, like you just keep going on and on, even though we don't sell things, those kind of sales, outreach kind of things are important in all of our projects, and we really have undervalued those. I think we're starting to get better at it, but it's something that needs more and more. It's interesting, like from a nonprofit perspective, a lot of people that come and volunteer in the GNOME project start off in one of the non-technical roles. They come in, they say, oh, we don't know enough about the code, how can we help you? And then once they get into the community, they help out, and then they learn the code, and then they move on to other parts. So we haven't been able to retain a lot of the people that come in in these non-technical roles because they're not really non-technical people, they just start off there because it's a good foothold for them. It's an interesting question, how to interest people that are non-technical to volunteer. Like we've tried, I remember a few years back, we tried to do a workshop trying to invite lots of college PR folks to come and help us, right? And we would help them, and it didn't really work. We didn't know what we're doing well enough, apparently. But this is a problem I would love to learn how to solve. So, thank you. Yeah, so in my own personal observations, I'm not a, I don't do non-technical things, I'm mostly on the code side of things and other things like that. What I've sort of noticed when I do look at the non-technical side of things is I see sort of, it's difficult for newcomers to on-ramp, even with the amount of effort that has been put into that recently. I still see a lot of turtles, and it's not like, from the technical perspective, we're, oh, you have a Git repository with a bunch of source code, anybody can just take a look. Non-technical contributions are generally require much more context, which means you have a more complex on-ramp process to being able to do that. I also noticed, similarly to what Nova pointed out, that a lot of the non-technical communities, like especially when it comes to design, are pretty insular within each of the projects, right? At least, I'm not in no or Katie design, I don't do design, but I've noticed that like, from a design perspective, there's, I see a lot less collaboration there compared to when it comes to code and other technical aspects. So I think those main two things, reducing sort of the insularity of the non-technical portions of the community, as well as finding, I don't know what the solution is for the on-ramp section, but figuring out some sort of way that we can provide that context better so people can get started. I love getting people involved with open source who are not technical, who maybe wouldn't have ever imagined that open source would be the place they would be participating in some way, but I would argue that almost every single person in the world wants to feel fulfilled by the things they do, and we have a space for all of them to contribute. We have a space where they can let their interests drive their skill development. And go far beyond what they currently have ever imagined their skill sets to be, even, because that entry point can lead to other sorts of contributions. And I think we just need to invite people differently. I think we need to build tools that lower the barrier of entry, that increase transparency in our projects, and allow people without technical skills to find that this project needs a lawyer to review their contracts or whatever, or give support to the developer getting an offer like that to change their license. It would be, but a lot of people have a perception of open source that it is simply, oh, I'm not a developer. I can't contribute, and that is just not the case. And it's the story we tell that makes that perception and the way we invite to participate. And so we have to change the way we present ourselves in multiple ways. Build tools that lower that barrier of entry, that increase the transparency of our projects. The way we talk about it and invite to it is friendly and inviting and encouraging and tell people, it's okay to not know that's why you're here. And I think almost everyone will be like, yeah, sure, I'll do that, and use their current skills and apply them and develop new skills along the way. But we haven't done a good job at changing any of those things and it's getting harder as like Discord and Slack. And we're getting more fractured in our forms and siloed in our forms of communication and it's getting harder to manage. And it's so complex that we're not doing a good job reining it in because it has a big problem. Oh my gosh, agree with so much of what Drew said. One of the things that I think as community leaders of projects that one of the things, and I know Heather Ellsworth is here somewhere, yay. One of the things we did together at Canonical and that we're doing together now at Thunderbird is community office hours where we highlight community contributors and we look specifically for non-developers. And because we realize that those people are so important and I think that even the way that we sometimes talk about it instead of just having kind of non-code saying we need design contributors, we need legal contributors, we need accounting contributors. Oh my gosh, it's like someone asked about that and it's like yeah, having people who know how to handle money is really helpful for projects. And so spotlight them and give, and having these video interviews that we've done where we can't ask them about, ask them about what they do and I think as importantly ask them what's their story? Because if people can watch those or read those and say their story, that could be mine. And you give people that narrative that open source isn't something that's scary, that it's something that can be for everybody and should be for everybody. So just the power of story, I think especially for this, is really important. I think, oh. Matthew, you can go first. One of the things like Drew, what you're saying about, the honorants we give people, I think for a specific example, if your project, you get to your project, how to get involved has make your first code contribution as the first thing you hit, which is very common. Like everybody else is like, oh, this is not for me right away. You're filtering people right out. So that's something that you can maybe look at if that's something you're concerned about in your project. In Fedora, I think we've done a pretty good job with a come hang out with us. Here's our, come to our release parties instead of your first, the first thing to do is to make friends and hang out with us, rather than try and find what technical thing you can do first. I think that's something we're working on. I think some of the other on-ramp things are hard. And yeah, Amonico, you're saying about the negative expression of this. I got, as one does, I got called out and mastered on about this. You know, I said non-code contributors and people are like, why are you making this a negative? I think it's partly because it's this weird problem we've made for ourselves where everywhere else in the world, that's just everything. Like that's just people. And we now have focused so much on this, you know, coding developer engineering thing. To, you know, where there's been in the past a, like you're not really part of the project unless you've contributed code. And we've gotten better about that, so we still don't really even have the language and I don't have an answer, but if anybody's got something that's non-negative we can use, other than listing each thing, which ends up being a list of 500 things, which is also not possible, I don't know. So just responding to that point, generally I just call them contributors. I don't really necessarily make a big distinction out of it though. And so Stardust is mostly centered around a Discord server. I'm trying to change that, but a lot of the community building is much more difficult in other places because people don't go to forums much anymore, as much as it would be very nice if they did. And so to that end there's a contributor role and whenever somebody helps out though, in any way they're a contributor and everybody can see that because their name is purple and such though. And one of the biggest on-ramps in Stardust is when you join the matrix space or the Discord server, there are a bunch of channels that you can just hang out in. There's the aesthetics channel where you can literally just dump like, ooh, I think this will look really, really cool though. And you can contribute in a very, very small, tiny way just by saying I like this idea. And even if it doesn't get in though, just having a discussion about it is very helpful though. There's concepts though. If you have a more fleshed out idea, you can put it there though. And it's just, it's wide open though. There's management marketing money channel which is very nice though. If you're inclined for that though and it's all incredibly transparent though, if you're having a hard time doing a particular thing, the entire atmosphere is very suited towards you can say it and people will understand and be understanding. I feel like a lot of the contributors that contribute to your project reflect how your community is structured. So if you make a community though that doesn't just feel like a group of developers, you're more likely to get people who aren't just developers. So it's a feedback loop for better and for worse. All right, we're running a little bit low on time. So this we're gonna do one more normal length question and then we're gonna do one a little bit faster. So this question is about community. Casual toxicity can be common in some Linux spaces even when we ultimately want the same goal. How may we better address this behavior? Oh, toxicity. Would you like me to repeat the whole question? Right, so I'm particularly lucky that Stardust hasn't really had to deal with that much. However, the limited times it has had to deal with it most of the time it's you talk to the people who are most affected by that toxicity. Though whoever is being hurt by it though and then you decide is it really worth it and 99% of the time no, it absolutely so isn't. Sometimes people are misunderstood but you still have to take them out of the conversation though so they don't spread toxicity and you can talk with them to the side. But yeah, overall though it just depends on the situation but cut it off before it spreads is I think my biggest recommendation. It's another one that I could talk about for a long time. I'll try to be quick. I think a lot of it is a matter of setting expectations and norms in your community and when you see things that are not okay talk to people about it. I also think I really believe that people can grow and learn and most people have that capacity and at the same time you have to look at impact first. And it's not our job to teach everybody how to be a better person as much as it would be great if we could do that. So it's always a hard balance to do this right and we want people to be able to make mistakes and we don't wanna kick someone out just because they are having a bad day. You wanna be able to understanding of people but people have to realize that when they're having a bad day how they act on that day really affects other people and can have cascading effects. It's just a lot of work. Speaking of things that lead to burnout I'm gonna now pass the mic on. Yeah, I did like a one hour talk on this about a few months ago. So yeah, I mentioned earlier I've been in this project for many, many years and so me and pretty much everyone else in this room has survivor bias, right? We all think we could handle whatever it is that we see and whatever toxicity we've been trying to stamp out because we're all good people, I hope. So yeah, it's important to listen to the new people that come in. If they say that there's something you have to trust them, you have to hear them out and see where it is that, because I know I've been super lucky to have been part of this community for so many years and I know that people have protected me and I know why I'm still here and I try and do that for other people but part of the bias is that I don't see everything and I know I don't see everything. So we gotta make sure that we keep our ears open and let people know, especially newcomers, that they can always speak out and always let us know what's going on and we will listen. That's the best advice I can give. Mostly agree with that. I, on a systemic level, right? I think inconsistent enforcement tends to be an issue and especially in large open source projects where you don't really have the team to do moderation that well, right? You sort of get cases where people do end up being toxic and maybe because of their status within the project or they've been here for let's say 10 years, that sort of behavior is allowed to slide and that sort of environment we sort of need to, while it's important to retain contributors, it's important to recognize that people sometimes have a bad day and that we have to recognize what people have done good for a project. We also have to recognize that I think it is overall better for a community to let somebody go that is toxic even though they've contributed a lot of code, for example, and it's not an easy decision to make but we do have to be much more strict about enforcement. When it comes to an individual level, what I would say is just when you see somebody that is getting a lot of heat or just I guess is a victim of sort of toxicity, I would just message them honestly, just see how they're doing, just check in and honestly that does, it helps a lot I would say when I have received toxicity in the past myself. So that's all I have to say for now, otherwise I'd probably be going on for hours. I think the first thing we have to combat this is going back to that point I made earlier about the sort of narrative around inviting people. The way you present your project, the way you present your ask for help and the people you attract with it is going to shape your project. The tone you set others will follow. And I think that there is a first position I take on meeting anyone is you're good, you want good things for you and for me. It may not be true but it's where I have to start because I don't think I would be right to assume otherwise. But that doesn't mean I have to let you plow over me or plow over someone else. And if I see something that I think is toxic or just generally, that seems a little weird, might just need to have a conversation with that person but it's a conversation about understanding. It's about understanding that person, what they mean and where they are sort of like emotionally and where their maybe belief is on what they just did or said or whatever. And I think that that's an important thing is you get a little bit of grace but you can't abuse it and you have to learn and grow. And we can't as a, I really think we can't just let things like that slide. It needs to be someone has to moderate it, someone has to actually step in and do that. And it's like no one's job, so who does it? Is it the maintainer, is it the, and you are, if you succeed in having a diverse group, you have a diverse set of communication skills, language barriers, cultural differences, perhaps just perspectives on the world. So it becomes a hard thing to moderate and there isn't an easy answer. If there was a silver bullet it would have been used by now. And I agree with many of the points here that when these incidents come up, the first step is a conversation with that person. And I think this is when having codes of conduct that are clear, that saying like, okay, these are the behaviors we do not that are not a part of who our community is. And just be explicit because I know it's easy to have kind of vague ones that are just like be nice. And it's like, okay, that really doesn't help. And so it's like, okay, hey, this is who we are. This is how we are going to be in community with each other. And if something happens and you have a bad day and you hurt someone, we're gonna talk with you and say, hey, what's going on? And then but also have a series of kind of steps. And if that person, if that bad day starts turning into bad week month, it's like have clear steps is like, okay, if this behavior persists knowingly, then there are documented steps that we're going to follow. And so it becomes process and not personal and have again that grace that if this person can grow and become that better person, that good person we want them to be. It's like, okay, you can come back. But if you're in a stage where you're actively harming the others, you're supposed to be in community with, we are taking the health of that community foremost. And I think having just clear guidelines on that is really helpful. We're not gonna do any responses for this question because we're running low on time. So the next one we're gonna do pretty quickly. I just want to appear like maybe 30 seconds of an idea. What areas or features should we be prioritizing for standardization? Like we've recently standardized accent colors and we're working on high contrast. What features can we work to standardize? And I just like your like couple of minutes or a couple of seconds of questions. Right, so I want to add though that all of us are coming from different projects with different sizes though. So smaller projects though, often will use more standards than they'll make. But as they get bigger, they'll create more standards. In the case of Stardust though, which is for AR and VR, it would really help if every single compositor would accept the agreed upon standard for VR headsets, for example. It would also help if we standardized things like, we added to standards, like for example, added 3D icons for different applications and such. It's not the most important, but it's incredibly important for the project I'm doing. Okay, keep things quick. EBPF programs, who gets to load them? What's the policy? Can my game change how the kernel behaves? Let's find out. We don't know. I'm gonna pass on this one. I'm gonna keep this quick, accessibility. I just think we have to listen to people using the software and understand where they're coming from and the challenges of the people actually using the stuff and make the decision from there to have a response without listening. I don't think it would be very informed, so. I'm going to have a plus one to the accessibility comment. And was that a timer? Okay, all right. And things like, especially considering the needs of, and that's accessibility for things like screen readers, but there's also things like cognitive accessibility. And so just consider, and also the ways that we can work together on those standards, how can we do these things in cooperation with each other so we're not having to duplicate effort that is, let's admit it, we don't have that much time and we don't have that much time. And so let's kind of, when we can work together on these standards, let's try to do so to the maximum extent that we can. All right, now we're going to wrap this up in a pretty little bow, this very difficult discussion. How can we work together to solve the problems we discussed today? A big part of the world changing over the past 20 years, just like the Linux space, has been globalization and more interconnectedness. How can we keep working together as a team to solve problems that affect the whole ecosystem? Let's do one sentence for this. Just keep on improving, keep on adapting. I think conferences like scale are really important, bringing people together like this, cross-community connections that don't happen otherwise. Let's make sure we keep doing these kind of things. Also, thank you very much for moderating this and putting this together. This has been great questions that we could have obviously talked for five hours on. Yeah, conversations keep all those conversations going and I would love to keep talking with all y'all about all of these topics and even more, so it'd be... In short, just conversations like these, especially across communities and dust-offs. Yeah, cross-community collaboration and working together on the problems of openness and transparency in open-source projects, I think would be, would go miles. Reach out to a project that is maybe related to yours and if there's a line of communication that isn't open yet, be the one to open it. I'd like to thank everyone on the panel for coming today. We really appreciate you coming out. I think some of you are already here, but I really do appreciate everyone coming on today. Thank you to everyone for coming in and listening to us. We really appreciate it and we need everyone's feedback to keep moving as an ecosystem. Panelists, if you guys could stay for like two minutes in the hallway, will we pack up? We want to chat for a moment before everyone leaves and we're gonna let it go for the next talk. Anyone know what it is? Woo, Thunderbird! Yeah, thank you guys. Popping from a handheld and a countryman to... Sorry, from two handhelds to a handheld and a countryman. All right, so let's just see the levels. Test one, two, test one, two. Actually, this comes in just about perfect anyway. For those of you slightly in the back of the room, how does this sound? Sounds pretty good to me. Is it a little bit hot or are we good? Sounds good. All right, so I don't have to do many changes. Do you prefer the handheld or the headset? So let's go ahead and have you put that on inadvertently going the other way. Does that make sense? Uh-huh. How about that? Now, go ahead and speak a little bit as if it's part of your talk. Uh, Thunderbird for Android is really awesome. All right, that sounds pretty good. Cool, awesome. That was easy. I need to futz with my computer for a moment. Okay, is there anything else I can help you with before I disappear? So I'm just gonna go ahead and turn this off so it doesn't pick up noise while you're working. Test one, two, test one, two, yeah. These, you gotta hold them right fairly close to your mouth like this. If you hold them out here, it picks up nothing. Which is fine, it's designed for people to sing like this and go right into it. And don't do this either where you hold it next to your chin. It's gotta be like you're trying to project through the end of the microphone. The quality drops a little bit when you go here. So, you know, and you can see how quickly it diminishes. Or if I'm up here, it stays pretty good, so. Can you guys still hear me? We have like seven minutes until this thing starts. So, hmm, Thunderbird for Android is awesome. Gonna talk about it. But I'm gonna talk about Thunderbird in general. So a lot of people here at our booth did not know that Thunderbird was still alive. And we got that kind of commonly at Faustem too a month or so ago. And so I think it's really exciting to be here and to say, yes, we are alive. And we're doing really cool things and you should pay attention. And a lot of people have used Thunderbird in the last, you know, 10 years and then dropped off for reason XYZ. And I wanna convince people to come back. It's worth it for lots of reasons. Oh, hi. Depends on what they are, maybe. You can do things offline. You can download them. You still will have to. I don't know them from my map. Mm-hmm. Right, into what looks like a mailder. But I already have them in a mailder. I don't know the actual answer to that. I mean, how, when was the last time you tried? Because I know that you can. I'm released. I look. Oh, okay. Okay. Okay, okay. Okay. I mean, you can import like inbox files. Yeah, sorry. I don't know if we can serve your specific use case. Okay. Let's see. Five minutes. Sorry. Anybody else have specific questions about Thunderbird? This thing is really big. Well, yeah. So Thunderbird is still alive. And it's a great email client out there that you should all use. What's up, Callian? I'll make it for me. Mm-hmm. Yeah, I don't know, it's 30-ish. 30 people? Yes, definitely. Yeah, no, that's, it's great to be here to say, like, yeah, we're not dead. Yeah, so generally, like people, including myself, when I joined, I didn't really understand the relationship between Mozilla and Thunderbird. And so Mozilla Foundation is a nonprofit kind of umbrella that has two for-profit entities underneath it. One of them is Mozilla Corporation. That's Firefox. That's a thousand people. Then there's MZLA. That's Thunderbird. That's 30 people. And we're two separate legal entities that try and make cool software to benefit the masses. Good grief, four minutes still. 100% user donations. No, we're all remote. I think the mailing address is technically like California somewhere, San Francisco, I think. Yeah, but we all work remotely. Yeah, it's, so out of the 30 people, there's a surprising amount of people in Canada and Vancouver specifically, yeah. And then, yeah, several in the US, some in Europe, some in New Zealand, one in Australia, yeah. What's that? Yeah, so for Thunderbird for Android, we're moving to adopt material three, the Android toolkit across the board. If you have an app, and there are apps out there that have a shared code base that work on Android and iOS and desktop applications, there's also often a performance hit that you see there. But if you have native apps, then you can mitigate quite a bit of that. So it's all a trade-off, right? Would you like to volunteer? Well, I would love it if he reached out on our Matrix channel, and then we could connect you with the right people to review those merge requests. We do, we have many, and I will point to them, okay? Right, so it's a minute to go. I think I'll just start a minute early. So I'm here to talk about Thunderbird for Android. Who the heck am I? You all came here to see Ryan Sybes. But he's in Colorado. What's that? Hi, Kyle. Yeah, so I, a little bit about me. My name is Heather Ellsworth. I've been at Thunderbird for about seven-ish months. I came from Canonical before then. I am the senior developer relations engineer. Ryan couldn't be here because there were wide-out conditions in Colorado where I also lived, but I came a day earlier, so I missed the snow. So I'm here to pretend to be Ryan. And since I'm the developer relations engineer, here's a room full of developers that I'm gonna relate to talk about this cool thing. Yeah. Okay, so we have to start with a primer. And so I'm sorry, but welcome to the Open Source Emo Clients History 101. I am glad you are all here. Thunderbird, you guys know what that is, right? There have been surprisingly number of people that have stopped by our booth at scale that don't know what Thunderbird is. And so it's been kind of exciting to talk about, just kind of introduce them to the concept of an open source email client. But a quick TLDR, if you don't know, is that we are an open source email client with a vast and rich history, about 20 years of history. And we came from Mozilla and have kind of spun off into our own little thing. So this is what Thunderbird Desktop looks like today. The Thunderbird Desktop application recently got a facelift with our latest ESR. It came out in the summer. It's nicknamed Supernova and the version is 115. Because we are still kind of married to Firefox and the code base and the build infrastructure and all of that, we follow the same release cycle. So there's one major release every summer. But we're not here to talk about the desktop Thunderbird. We're here to talk about the elephant in the room. So way back when the smartphone was born with the first release of the iPhone and the popularity of smartphones has only increased over time. So yes, we are late to the party, but we're here now and we're not backing down. Okay, so we were very late to the party. This is exhibit A. You can see when we first revealed our intention to bring Thunderbird to Android. So, but to be fair in our defense, we were going through some stuff. So some of you may remember that Thunderbird spent some time in the wilderness getting our stuff together. We should have been dead, but we have been revived and we're thriving. So let's take a little trip down memory lane. Once upon a time, in 2012, Mozilla ejects us from the corporation and we became a community developed project. So for these reasons, it was really hard times. We certainly weren't thinking about mobile. We were just trying to keep Thunderbird desktop available to its users. Mozilla no longer wanted to employ anyone to work on the project, to actively maintain it. So it was often in an unbuildable state and there was really no vision. It was just kind of floundering for many years. But then it was because of our dedicated community of volunteers that we were able to remain alive and begin to rebuild. There were a bunch of people that took over maintenance, elected a council. We hired some folks slowly but surely with our little resources and we asked our users for donation help starting with a donation appeal on the website and then eventually an in-app donation appeal in 2022. And what that means is when you open Thunderbird on a fresh start, it will have a little message that just says, tells you like, hey, we are funded by less than 1% of our user base and we're 100% funded by that. So if you love this product and wanna contribute to it, here's some ways you can get involved and please give us money if you value it. And so you can see that that made a large impact. So in 2022, that was the first time we asked for money inside the app and we tried to not do it in a gratuitous amount, right? Every time you, it's not every day, it's not every five minutes, it's the first time you run Thunderbird on that boot. So that was huge and it made a lot and it gave us the foundation we needed to continue on and to grow our vision. We got better at telling our story and got focused, really focused on improving Thunderbird. So that brings us up to the year 2022. But just kidding, right? Let's go, right? We're here to talk about Thunderbird for Android. So there's two things happening in parallel. One is kind of the history of Thunderbird. Now let's talk about Canine and Thunderbird for Android, how that got started. So while we were focusing on getting our stuff together, a little app was born that was focused on mobile called Canine Mail. So Canine Mail was started when Jesse Vincent wanted to make a patch to the original Android email application in Android 1.0. Back then the creators of Android didn't really understand how to handle community contributions so he ended up making his own app and sharing it with the world. So it chugs along for over a decade and then eventually Ryan met up with Keti who at this time was the lead developer that took over the project. At Fostom they started to talk about maybe we should work together, join forces, let's see what happens, right? Ultimately it made sense to work together instead of developing a mobile client from scratch. Canine had a sustainability problem and needed help and Thunderbird didn't have a good plan for getting Thunderbird on Android. So for real, back to 2022, we adopted a puppy. Everyone loves a puppy. And so this was the announcement that we alluded to in a previous slide. Okay, so why? Why does there need to be Thunderbird for Android app at all? Like why did we decide to work together? Well, there's a lot of people using Android. 94% of the world has a mobile phone and three billion people are using Android. It's the largest single computing platform by number of users and it has a Linux kernel. Woohoo. Globally Android takes about 70% of the mobile operating system, market share, and that's 48% of the world's population full stop. That's a lot of people that we can reach and emails never going away. For better or worse. And there are limited options out there. So our main competition is Gmail default on Android devices and Apple Mail default on iOS devices. There is fair mail and a handful of other open source Android email clients that exist but not many and not many with a reliable source of funding that is sustainable like Thunderbird. We have values that align with a lot of people. I think you all might agree since you're here, we don't steal data, there are no ads and we don't do anything nefarious. Selfishly, we wanted Thunderbird on our phones and this is how good software starts, right? It's through selfish reasons. Okay, so you need to be convinced to use Thunderbird for Android or canine. Okay, well today canine mail does several things. So there's a unified inbox just like Thunderbird desktop application. So you can have multiple email clients, or sorry, emails, not clients, my bad. You plug in your many emails, you can see them all in a unified view. It's very customizable. The swipe actions, so you can swipe and just like on Tinder but canine will never reject you. There's powerful settings just like in the desktop client. It's very customizable. It's privacy respecting, like I said, your data is yours and you are the person, not a product. And it's Overlord Provider Agnostic. You can put in your various emails, whether it's Gmail, Yahoo, Hotmail, Protonmail. Actually Protonmail, we can't do in canine yet but we hope to eventually maybe one day do that. You can do that with your desktop application as many of us do. And then of course it's open source. You can check out monthly updates that our main developer publishes on the blog.thunderbird.net and I'll provide a link to that later. But yeah, so canine mail is already a really strong, it's in a strong usable spot. Now when we look at Thunderbird on Android, one cool thing that's coming is sync. So the idea is with a Mozilla account, you can sync your account settings and automatically have all the accounts that you set up in desktop in your fresh install of Thunderbird on Android. You just log in with your Mozilla account and all of your filters, signatures, other settings, emails, all of it is there immediately. It's end-to-end encrypted so we can't see any of your data. Calendar invites that hit your inbox should behave properly and your context data will be available in your Thunderbird for Android application. At one point, both Thunderbird and canine were ugly, let's be honest, and they hurt the eyes to use. We continually are trying to make it better and more accessible. Undo actions. Oh crap, I deleted that email. I didn't mean to undo. This might not be available in the first release of Thunderbird for Android because it comes with many potential issues that we may need to plan extremely careful. We may need to plan for it extremely carefully so we don't mess that up. But it will be a feature eventually, if not in the first release. Foldables are becoming more common. So if you've stopped by our booth, booth 215, you've seen that we have Android tablets. So you can have canine on your Android tablet but foldables are becoming more ubiquitous. So we wanna have a great experience on the larger form factors as well as traditional phones too. And then we need to complete the adoption of the Material 3 Android toolkit. Right now, it's honestly a little bit of a mess. There's a mix of Material 1, Material 2, custom XML layouts, custom colors. And that just comes with an older code base. So we're trying to standardize all of that on Material 3. Okay, so what's left to do before we finish that rebranding? There's still some UI improvements and we'll kind of show some pictures in the next slide that show where we're going. But folder drawer is the default implementation of unified folder alongside the ability to manage and organize long folder lists. Plus the message view, improvements of the content rendering of a single message view. And again, we are almost done with porting to Material 3. But we need to complete that. And then we're simplifying the settings. So you may have noticed through the history of Thunderbird Desktop that we have a lot of settings. And so we're trying to nip that in the bud quickly with the Android product. So let's simplify them, improve and organize to offer simpler and more intuitive paths for customization and control. Because that is still really important. Okay, so here's a few different screenshots. The rightmost one shows like our visual goal for the first release of Thunderbird for Android. The leftmost picture shows the new message view UI with a better visual organization and a little pill to highlight the current account. So if you have multiple accounts, this one is from Thunderbird Ryan. You can kind of see what this is going to. And then in the middle we kind of show the bottom drawer, the bottom sheets to better organize information about the senders and recipients. So we're trying to make this as intuitive as possible. Okay, so what's the plan getting there? All of the mentioned updates will be applied to the current canine application, as well as the Thunderbird for Android. So they're going to be two separate apps. There's no way for us to migrate a profile from one to the other, unfortunately. But we're going to maintain these two apps in parallel with no extra engineering effort. Thank you to material three. Once we're happy with the upgrades for canine and feel like they are up to par with where we want Thunderbird from mobile to be, more feature complete with the desktop, then we will release Thunderbird as a separate app along with canine, so the exact same code base, except the theme is the theme is the only difference there. There were a lot of existing canine users that loved it and when we announced our plans, they were like, oh no, you're going to kill canine. And we're like, no, we're not. They're both there. You can choose which one you love. They're going to be the same with just a slightly different theme. And we plan to keep them running side by side for the foreseeable future. So we were just at Fostom and if anybody's ever been there, it's hectic, it's insane, 8,000 developers and they all had questions. So here are just a few of them. Exchange support for Android. So at Fostom, we gave five talks. One of them was about our native implementation of exchange in the Thunderbird desktop. That's a big deal because anybody that's had to use exchange servers for work, which is unfortunately common, would have had to use a paid add-on to get the finicky support there. But we are baking that in natively so it is free and it's written in RESTs, which is new and shiny. So we need exchange support in the Android app as well. And it appears likely that we'll be able to leverage the work that we've done for the desktop on the Thunderbird for Android just because our architect is writing it in a modular and reusable way for the REST libraries. Currently, there is no bridge for Proton Mail on Android. Like I said, I really want that. If that changes, we will work on that. But right now, there's no plans there. What about Thunderbird add-ons? I mean add-ons slash extensions, they've been around for a long time. Would those be available for Thunderbird for Android? Well, not in the immediate future, but there are ways to support them eventually. If we were to implement Gecko View, then there's a path to getting some of the add-ons to work. But we'll see. We'll see where that falls with our other priorities. What about the upgrade path? Well, we want to offer an easy upgrade path, but unfortunately, we're not able to migrate the profile from one app to the other. So they are separate installs. But we want to encourage people to lean into that Thunderbird branding that we all know and love and please install Thunderbird for Android. But like I said, we'll continue to maintain both apps in parallel for the foreseeable future. So if you would like to try out K9, soon to be Thunderbird for Android, you can go here. It will take you to a place where you can choose to download it from FDroid or Google Play, and even I think the APK directly. You can, oh, I guess I, did I show links? Oh yeah, these links. Yeah, the monthly updates, important. The code is all there on GitHub. That's where the issues are too. The developers are super responsive and really nice down-to-earth people. And then of course there's the Matrix channel. All of our public engagement channels are on Matrix, and the community team is doing a good job trying to create generic safe spaces as well as specific ones to just development or design or Android. So yeah, if you have an Android phone and you want to try out Thunderbird for Android, please do. I went through this much faster than I had planned to. But yeah, if you guys have any questions, I'd be happy to answer them. What's up, Kyle? No, thanks. Right, so the question is, is it possible to export settings from Thunderbird for desktop to import into Thunderbird for Android? Right now, no, but the idea is with a Mozilla account, it'll sync. That's kind of, and that's one of the important things that we want to have before that first release of Thunderbird for Android. So that you don't even have to do anything to export and import, it just happens, yeah. The question is, is it gonna be transparent or like to be able to set it up with Ansible? Yeah, that's a good point. If it was exportable and importable by like JSON, I don't know the answer to that. I think that's a good suggestion. Yeah, yeah, yeah, thanks for the suggestion and I can take that back and our developers might have thought of that, but I don't know the answer. So the question is about using Proton Mail but with a third-party server that acts as a bridge. Is that gonna be right? That's a good question and I don't know. Yeah, I would encourage you to go to the Matrix channel and ask it. Yeah, eggy. Yeah, that would make a lot of sense to use the API to do some authentication to be able to pull what you need. Yeah, I think I'll certainly take that back and see what they say. Yeah. Oh wait, what, oh you got the mic, great. I actually have two questions. One is, is this URL to the K9 that I have loaded today on my browser or the new version that is the dev release for Thunderbird? Oh yeah, good question. So this URL will take you to download the K9 that you have today but then you can also go and update to the beta. I understand that the extensions from Thunderbird that are available to me now probably won't work on Android because compile differences and stuff like that. Will the API that's currently in Thunderbird be supported natively on Android so that somebody could adopt or rewrite or augment the Android platform with extensions and drop them in? I think the APIs will be different because it's a different code base. But. If they're gonna do something in the mail, you know, do something here or do something there, will that be available? Will there be an API that could mimic not maybe one for one, but overall consistency? Possibly, possibly. It's not on the roadmap now then. I don't know. I don't, I mean, all of the code is available and I don't see why they wouldn't publish some sort of API if it's not already out there. Yes, this is, I guess, more of a question about sync in Thunderbird on the desktop. Which I've heard comes someday but I haven't really looked into it myself. Does that sync include things like calendar accounts for lightning? Yeah, yeah, absolutely. So the idea is using your one Firefox slash Mozilla account, it will be able to sync all of your settings including your calendar stuff. Okay, what are the most popular add-ons for desktop Thunderbird these days? I saw that in one of the screenshots there's a GPG or a PGP sort of symbol. I came in a little late, I apologize. Yeah, no, that's a good question. Like what are some of the most popular add-ons? Yeah, I know those things addressable in other ways on Android that make extensions not really as big a deal. Right, yeah. I don't have that list off the top of my head but you can go to the web extensions site and sort them by popularity. I can tell you the most popular add-on is the Exchange add-on. The Owl add-on that will soon be moot. Oh, it was just that there are several reasons why we are choosing to add that functionality in natively. One of them is the sheer popularity of it. And it's a paid add-on today and it would be really nice if our many, many, many users that need that add-on to get their work done didn't have to pay for it. Well, we can't control how many people sign up with Gmail or Yahoo or Outlook, right? And it's not that the tail doesn't matter, it's that we need to prioritize our limited resources. I forget the gentleman's name but he's writing a book on how to create your own email server. And the, I'm sure he's using Post Fix or something. Anyways, it's not just a long tail because if you have an account on Outlook or Google, you probably have an account on Outlook, Google, Yahoo, Hotmail and others and having them all keep up with you, including your business accounts is, well, it'd be nice to have a single presentation of everything, but yeah, it's- Right, that's a unified inbox now that's pretty- Any other questions? Oh yeah, yes sir, right in the middle there. I've been part of the Linux community since 1998 and for a long time I used Evolution and one of the best features of Evolution was the ability to put all of my data, whether it's the address book or the actual emails or calendar into a compressed file format and be able to reinstall Linux and then take that zip file and import it into Evolution again and have everything reappear. That's one of the main reasons I chose Evolution in the first place and I was wondering, are there any plans to make a similar feature for Thunderbird? That exists right now. It's existed for years. Thunderbird centralizes your data into a profile folder and you can easily just lift that folder onto wherever you want it to be and then import that with Thunderbird. Is that what you were asking? So, maybe not visible, well yeah, not a lot of visibility for it, but if you go to the app menu and then just choose help and troubleshooting information, it'll actually open a window that shows you where your profile is, you can open it directly from there and then just copy it to USB stick, upload it to the next cloud. That's what I do, I have my own next cloud so whatever installation I have, I just download it and import it and it's, it is. You can also do that. You can do that as well. Just, yeah, just file export, literally file export, yeah. Come by the booth and I'll show you guys, I'll show you how to do that, okay? Thanks, Jason. You know what, it's good feedback to hear two people who are very experienced with software who don't understand how to do that and so that's something that we can, you know, do better is making that more visible. That's good feedback. Anybody else? Okay, well, if you do come up with any questions later, please stop by our booth for a shiny sticker. Booth 215. Booth 215, we're right next to GNOME. Yeah, awesome. Thanks everybody. Let's see, is that working in the speakers? Cool, let's skip that for now. How's that, still working? All right, let's adjust this then. Still good? Yeah, that'll be better. Okay, so we can get to it. So, hi, I'm Casey Brownschweig. I'm a production engineer at Meta and we're gonna talk about distributed systems, our distributed systems philosophy. So, it's getting close to 20 years that I've been working on large distributed systems now and if you want to learn about distributed systems, I think the standard approach to that or like the depth first search for that kind of knowledge would be, you know, you get an engineering degree, you get a CS degree, maybe you get a master's degree, you get an internship, you go work at a tech company, like you sort of grow from there and I did not do any of those things. So, I would call this the opposite of that. That would be breadth first and so let me tell you about how I got started with this. So, I did go to school, some people skip that part too. I went to USC right here in LA, fight on, but I did not get a CS degree. So, I went to theater school. I did also go to the business school, information systems and operations management because for some reason they glommed together information systems with manufacturing operations. That will become relevant later, it turns out. And also, I've been a scale volunteer since the very beginning of scale. Thanks to being drug here by somebody who's still in the audience today, so thank you very much for that and I appreciate you all still being here. I owe a lot to scale. So, after I graduated, I started working in the industry. I got a job at Ticketmaster.com. Started working on high speed, high volume sales, large queues, so you know, hundreds of thousands of people queuing online to buy 5,000 concert tickets, which turns out to be a unique and challenging distributed systems problem. And interestingly, so the first thing that became immediately relevant to me was queuing theory or waiting line theory. And I had actually learned this in school. I learned this in the one operations management class that they made us take when you were otherwise an information systems focused person. And I didn't care at all about it because I had nothing to apply it to. But I suddenly went back and I'm like, oh, do I still have that book? Let me learn something about waiting line theory because I'm gonna spend, turns out the next five years caring a lot about queuing. So, I started to reflect on education and how I learned things. And I went on to Edmunds.com, was there for a short period of time, worked on configuration management, which is largely what got me in the door at Facebook, working on Chef and configuration management at massive scale. After I worked on that for a while, I worked on log infrastructure and stream processing, also at very large scale, crazy large scale. And then I worked on coordination infrastructure, which at the time was Apache Zookeeper, now it's other things, but similar ideas. And then now I work on public cloud infrastructure, tooling for things that we run in public cloud, which we actually do. I know we're known for our data centers, but we do have some things in public cloud as well, which means learning whatever the new thing is today. So, let me take a step back to distributed systems and computer science in general. There was this guy, Phil Carlton, who was at Netscape in the 90s, who said there's only two hard problems in computer science. Some of you may have heard this, and let me stop blowing into this mic. Yeah, maybe that's a little better. So, two hard problems in computer science, cache invalidation and naming things. And then somebody pointed out later, that there's also off by one errors. Some people have heard this joke. And it is a joke, but I also like this joke. I like it, I think it's funny, but it also captures three problem areas that I think really do exist in computer science. And if we think about these a little more broadly. So, cache invalidation is like math problems, actual things that are computationally difficult, actual algorithms, right? And this is a real thing, this really does happen. But there are also people problems, right? We have to write software that has to be written by people, understood by people, maintained by people, deployed by people, all the things that we talked about yesterday at DevOps days. And that's equally important. You can fail hard at software without having any math problems. And then there's just bugs and weird shit that happens. Like, you have to deploy this in the real world and weird stuff happens and you have to deal with it. This is not theory, this is practice. And that's also important. And I know a lot of people are interviewing, I'm not here to talk about interviewing today, but I know a lot of people are interviewing and it's tough. And like, whatever you have to do to get the job at the company you wanna get, great, go do it. Like, I have no problem with that. But if you're talking, like, reading about tech company interviews and people grinding on leak code and all that stuff that's super focused on algorithms and like that first problem area, that is not a good representation of what we do in this job. Like, that is probably the least of these three areas that you spend your time on. So, I just, I don't think it's fair, I don't think it's a good thing for the industry. It's just a bummer, really. So, what are we gonna talk about today? I didn't start my timer, so I don't even know how I'm doing. So, we're gonna start with the fallacies of distributed computing, which is a thing that's existed for a while, we'll talk about, and then I'm gonna do a pine on some philosophy and things that I've learned over time and then we'll try to apply that to some actual algorithms and patterns and things that you might see or might have seen and then we'll draw some conclusions from that. So, the fallacies of distributed computing. Before we start with this, I should be clear here that everything these days is a distributed system, really. And, you know, there's nothing really interesting that anybody's gonna do on a single process on a single machine. Like, that's just boring, right? Whether you have a bunch of Raspberry Pis on your desk or a bunch of fricking containers running God knows where or actual global cloud infrastructure, whatever it is, there is some aspect of distributed systems that's a part of it, so this applies to you. This concept of the fallacies of distributed computing came out of Sun Microsystems also back in the 90s and it was a series of false assumptions that new programmers invariably make working on distributed systems. I'm just gonna tell you what they are. This is straight from Wikipedia, by the way, if you haven't seen it. The fallacies are the network is reliable, latency is zero, bandwidth is infinite, the network is secure, topology doesn't change. There's one administrator, his name is Phil. Transport cost is zero and the network is homogenous. Now, in the abstract, when I read these out, and we're not gonna go through them with examples and stuff, it's very easy in the abstract to be like, well, of course, nobody believes any of those things, nobody would do that in practice. But it happens all the time because there's a difference between reading these on the page and taking a running system that you built that you're kind of proud of that you think is pretty clever and then keeping these in mind as it changes, as your situation changes. These aren't necessarily mistakes, they're potentially false assumptions that we make all the time. These are spherical cows. I hope you like gen AI, that's what I did for all my images. Do people know what spherical cows are? Somebody told me that they heard them as spherical chickens. So imagine that you're, there was this dairy farmer, and he wanted to optimize his dairy farm. He wanted to get milk production up. And so he went to some smart friends of his that worked at the local university and said, hey, can you guys tell me how to optimize my dairy operation? And they were like, sure, whatever, happy to help. It turns out his really smart friends are very smart and they're all physicists. They work in the physics department at this university. I don't know why he asked them for help, but hey. But they went and they dutifully did a bunch of research and wrote a report and came back and said, we have come up with a theory for the perfectly optimal milking operation. Like it's perfect, but it only works for spherical cows on a frictionless field. Yes, in a vacuum. So bold assumptions that don't have a lot to do with how cows actually work. But here's the thing, those assumptions, it's taken to an extreme, it's a little silly, but physicists do this all the time. We all do this all the time. You simplify things down and make assumptions. Are those assumptions mistakes? I don't know. It depends on what problem you're trying to solve and whether that simplification helps you or is a mistake. Let's make that a little bit more tangible. Let's say you live in Southern California, you work for a small web company, or a small company that's like brick and mortar, right? But you run a consumer-facing website for customers that are all in the west coast of the US. So you run your website out of cloud data centers that are on the west coast of the US and you have some geographic reliability and life is good. You're not dumb. The latency isn't zero for your customers and the network isn't completely reliable. Regions have problems, but all these regions are roughly okay. They're pretty close, they're pretty fast, they have good connectivity. They're all kind of the same. You don't need to worry too much about those things. Until one day, when your marketing campaign goes viral, and now you get international or nationwide attention, people start hitting your site, or your CEO makes a deal for franchising in Australia, and now you're gonna expand across the ocean. Your topology is gonna change. All of these things gonna change. All of these assumptions that you made no longer hold and you have to revisit all of that. It wasn't wrong before, but it sure is now. People make that kind of error all the time. So let's take that into the data center. I think people know about system optimization. If you've ever done optimization at any level, you probably thought about bottlenecks, right? When you're optimizing something, one aspect of your system is going to be the part that's making it slow. That's the bottleneck, and if you're not optimizing that piece, you're wasting your time. Making everything else faster won't help. The way this usually works out for some large application in a data center is that bottleneck is gonna be one of three things. Compute, network, or power. And this is a cycle that repeats, hence the bottleneck cycle. So in the best case scenario, if you have a business that runs some application in a data center, you've bought some compute and you're trying to utilize that as much as possible. You want to use all that CPU to do transactions to turn that compute into money for your business. And so when you run out of compute, you buy more compute and you do more transactions and you print more money and life is good, ideally. So one day you get that next piece of compute, that next rack, that next whatever, and you can't utilize it anymore because you're running out of network, typically. You're running out of something else. Now you have to go optimize that. Adding another layer of compute doesn't help anymore. Doesn't help with the actual problem, the thing that prints you money. And so you go work on that. And after you work on that, maybe you can do that again. You can go back and forth on these for a while. And then that stops working. You find out there's not enough physical space in the data center to put more compute or more network or there's not enough power to turn it on. Or you can't hire enough construction companies to build more data centers in the world. Like whatever scale you're operating at, like you will hit this thing. And so one of those three things will become your bottleneck and you go around and around and try to solve these problems. This sort of three-way trade-off is a thing that comes up a lot. It also happens in project management. This one I bet you all have seen. Good, fast and cheap, pick two. A lot of people have seen that. You run a business, so you have to pick cheap. So really you can choose between good and fast unless you are selling things to the government in which case you probably can't have any of these three. It's a different problem. So that's the project management piece. So let's talk about a little bit of philosophy. So to keep with our education theme, if you've taken a philosophy class, you know, philosophy is not about getting the right answer. It's about some stuff to think about. And so what we take from that are trade-offs. Things to think about. And when I say trade-off, it's important you understand this is non-binary. It's not about column A or column B. It's about picking what to focus on and you can be anywhere along a spectrum between those two things. And really this comes from that triangle that we just talked about. Typically there's three things and usually one of those things is imposed on you and it's very hard for you to actually choose to change. And so the other two are where you can trade-off and choose where you're going to spend your time, whether it's network versus compute, good versus fast, like a lot of things. So let's apply this back to our little joke about hard problems in computer science. And it works like this. There's algorithms and math. You can spend time on that, optimize that. There's people problems. And then there's just weird shit. There's entropy and like you can't change that. That is imposed on you by the fact that we exist in this reality and you have to deal with it. So this brings us to a point in my career where I'm starting out at Facebook and so I'm working on larger systems than I've ever touched before, larger problems than I've ever had and I'm learning about configuration management at that scale. This is also about the time of John Allspaugh's web operations book and I was learning a lot about on-call and incident management and a lot of stuff like that, which is super interesting and not what this talk is about. But I read this thing called How Complex Systems Fail? Written by a medical doctor named Richard Cook. It got a lot of playback at this time. If you haven't read it, please go read it. It's very short, it's awesome and it has a bunch to say about how things fail, how you prevent failure and it's not written about computers at all. This is a medical doctor writing about how people die in operating rooms. Also my dad was a doctor and so it was a fun connection. Anyway, go read it, it's awesome. But we're not here to talk about incident management and that sort of stuff but it also caused me to step back and think about why somebody who is a doctor writing about patients and patient safety, like why was this so relevant to computing and so I took a different view of the systems I was working on. By the way, this is my Gen AI attempt at the Swiss cheese model, for those of you that have seen that. The thing that I came to think about is I had always thought of systems in general and systems problems as something to be understood. I didn't have this background, didn't have that deep knowledge and I saw a bunch of people, I also had imposter syndrome, which I still do and I thought all these people were smarter than me and they sort of understood everything and I didn't because I wasn't smart enough and as I started to see larger and larger systems, I started to realize that at large scale it's impossible for anyone to hold that in their head. It can't be just because these people are smarter. Nobody does that, there's something different going on and I started relating it to what Richard Cook was writing about about biological systems and thinking about how complex biological systems work. So if you think about your body, like people, animals, anything, are these massively complex systems made up of simple cells, like tons and tons of simple cells that all work together to produce our bodies and our bodies work into graded modes all the time. We have an immune system that's constantly fighting things off and we're trying to be at our best but it doesn't always work, we're trying to perform at our best but we're never at 100% and yet we create these amazing things, we create the whole world around us. Fantastically complex but not designed to be complex, built out of simple components that work together to create complexity as an emergent property. And as I saw well-designed distributed systems, they have the same characteristic. Things that are designed to be complex are impossible to understand and impossible to make reliable. Things that are built out of simple components people can understand, they can understand the simple components and then make assumptions about the larger system. And also they tend to be more reliable. So it changed my way of thinking about that and it changed my way of thinking about my own ability to understand this stuff. And so I think when you work in small scale as everyone does when they're starting out it tempts you with this myth where you think you're really smart because you can understand everything. And it also has this thing that I describe as inertia. When you have a small scale system it's typically very easy to turn the whole thing off. Something's wrong and you can sort of understand all the starting conditions and you turn the whole thing off and reset all the starting conditions and understand it all and turn it back on again. When you have something like meta's infrastructure that is impossible. Cold starts are not a thing. When you have a human body again cold starts are really a problem. You don't want to try to do that. Typically doesn't work out. But you see this, right? If meta's ever been down oftentimes we know how to fix the cause of the cause as if there's a single root cause. That's a whole different talk. But we know how to fix the problem very quickly and it still often takes hours for it to be back to normal because these systems have this inertia that you have to get through to recover, right? Which is why I'm still coughing even though I'm past the toddler plague. So we have these choices, all right? And again I was working on configuration management. This is not a talk about configuration management but if you were there in that time there was this philosophical debate going on between command and control and cooperation or imperative and declarative methods of configuration management. I don't wanna start that because that's not what we're here to do and people have opinions, whatever. I was working on Chef so you probably have a good idea where I came down on that. But let's think about how this trade off works. In an imperative system you're describing a specific set of actions. You're like I'm gonna know the state of a system. I'm gonna say a bunch of things that are supposed to happen and then I'm gonna force reality to bend to my will and then at the end of it I'm going to have the system I wanna have, right? It's this idea that you will be in control and you will know and you will from like in a centralized way be able to have certainty about a larger system that you can't necessarily hold all in your head at once. It's a bold claim. And then there's the other model of cooperation and declarative where we declare the state that we want the world to be in and we design a system that attempts to converge on that state over time and it becomes very, very difficult to know at any one moment are we actually in that state or not but we're very confident that we're trying to get there and things will probably be better than they are now soon, right? Much like our immune system. And at this point I started getting some of the academic backing for this. So there's a guy named Mark Burgess who spoke here at scale, who's awesome. Who's a college, is a professor, is a pure academic. At this point I read the book like it is very dense, it is very academic. If you're really interested in this stuff and you give it a read, it is not easy to parse. I would probably not have been able to get through it but I was fortunate enough at the time to work with Adam Jacob, the guy who wrote Chef who's also spoken here and is awesome. Adam is not an academic. He is deeply focused on pragmatically helping cisadmins solve real world problems. And he's really good at relating this stuff and so that gave me a way in to this thing that was very deep about promise theory and about the sort of mathematical and philosophical backing of why this approach to configuration management works and helped me make it real for a real problem I was trying to solve. So that was my journey with configuration management. And these ideas extend to any sort of pull system versus push based system. And this comes up a lot of you can have a push based system or you can have a pull based system and in the pull based system then you want to build push like semantics because they will help you for some reason and you can generally do that but it takes some extra work like and you learn about these trade offs. So that was a big learning of trade offs for me. So everything I've been talking about may like the next thing I wanted to put on this slide was certainty versus entropy. That felt like the next short concise thing to have here as a trade off and I really wanted to do it and then thinking about this talk I realized I was wrong. It's not about certainty. Entropy is not optional. The actual trade off here is the myth of certainty. This thinking that we know what's going to happen versus accepting that entropy is not optional that the real world is a thing and then we have to deal with that and then we are attempting to mitigate that entropy as much as possible. It doesn't sound clean and in practice it's not. Like this is a hard truth about working in the real world. But let's step back from that for a second. Speed versus completeness. This is another one that's a little more concise. After I worked on configuration management I moved on to log processing and logging infrastructure and data stores, moving data. So if you're building a data store you're collecting a lot of data that you want to query do you care more about speed or completeness? Like I don't know. You need to tell me what problem you're trying to solve. If you tell me you're collecting that data because you want to do financial reporting on it you probably care a lot about completeness. Like you will wait for a slow query to make sure that you query that data when you can guarantee the state of that data when you made that query and that you've accounted for everything. It's probably required by law in fact. Now if you tell me you're building a monitoring system you want a very different trade off. I want to know when we violated an SLO as quickly as possible so we can respond as quickly as possible. And I don't need to be exact about it. If we only query 90% of the data and we're a little bit off in when we calculate that SLO violation, who cares? It's probably right. And if we're wrong, it'll be okay. But I want to know sooner. I don't want to know in five minutes. I want to know now. So we can respond faster so we can fix problems. And finally security. So I think a lot of people, especially security people like to think of security as a trade off between insecure and secure. That is not what it is. It is security versus usability. To illustrate this, if you want something to be secure lock it in a safe and light it on fire. But it will not be very useful. This will not produce a business that makes money. Sorry, it just won't. You need to make some trade off. It doesn't mean I don't care about security, I do. But we also need to get something done. We have some problem to solve. And the trade off that you make here depends on your appetite for these two things. If you're writing software for the NSA you care a lot about security for very good reasons. And you're willing to trade off your user experience to get it. And your users are probably fine with that. We should still care. But it's very different if you tell me you're writing software to be used by a bunch of retail workers that make minimum wage. Their appetite for that is very different. And finally we need to talk about time. We're not actually going to have a philosophical discussion about time and the true nature of time. We're being practical here. But we still need to talk about time because even in a practical sense in distributed systems time is just hard. And a big reason it's hard is because we have these intuitive beliefs about how time is supposed to work. Time is supposed to be linear and it's supposed to be monotonically increasing, like second sense the epoch. It's just what your brain wants to believe. And it's just wrong. It doesn't work out that way. We have leap seconds. We have daylight savings time. We just decide to change the clock because whatever, because the law says we have to change the clock and it sucks. We have snapshots where we take a snapshot of a machine and we come back a week later and it comes screaming out of the void and finds the whole world has moved on. And what do we do now? We have choices. We have a trade off to make up how to fix this problem. A typical NTP way to deal with this problem is skewing the clock. We'll just nudge these seconds and make them a little longer, a little shorter, or whatever, and nudge that clock back where it should be. And if the clock's off by a little bit and maybe probably, it'll probably be OK. That's probably fine. But what if you're launching that thing and you're launching the system and its clock is way off? And cloud init or Docker compose or systemd or whatever has this big long list of all these things that's trying to start up because you want to start running that compute and printing money as fast as you can because you're wasting time. And you need to issue certificates or something. All of these things are going to fail unless you have a consistent view of the clock. It's fundamental to security to a ton of stuff. So you better get that clock right, right fucking now, because all the shit's going to fail. These trade-offs matter, right? I'm like, I don't know. So anyway, that's in the normal case. We talked about good, fast, and cheap, and you have to pick cheap. Well, every once in a while, you do get to pick not cheap. An example of this happened at Google, unsurprisingly. They said, we want a database or at least a global data store with monotonically increasing timestamps. They wanted time to work the way that we think it should. And they made it happen, and this is how Spanner works. Now I never worked at Google. I'm not going to try to represent how Spanner works. I know there's people who work to Google here. But at a very high level, they made heavy use of hardware-based clocks, GPS clocks, and atomic clocks installed all over the place in order to make time work the way that we think it should and have global consistency and monotonically increasing timestamps. So yes, sometimes you can make the other choice, but it's going to cost you. And that brings us to the end of philosophy. Let's try to apply this like they did. So algorithms and patterns. So once again, when we talk about algorithms here, I'm not qualified to teach you math. I'm not going to try to teach you math. We are taking a Wikipedia level view of this stuff. And there's a reason for that. I don't know what your problems are. I don't know what matters to you. I want you to see some patterns. And then if you have that problem and you need to dive deeper, you will know enough to go learn more from something more reputable, whatever. So that's where we're at. That's OK. You can also ask ChatGPT. It's actually kind of good at explaining some of this stuff. New world. OK. We're going to start with big O notation. And this is just a little bit of a rant. Big O notation comes up a lot, especially with interviews. It's just a notation for describing algorithmic complexity. Algorithmic complexity is important. I don't give a crap about big O notation. How often does it happen that you have five different algorithmic implementations for the same problem and somebody wants you to rank order them before you pick the one that works? That is a test question. That is not what happens in the real world outside of maybe a couple times at Google. I'm not saying you shouldn't care about this, but you need to identify the piece of your system that's slow and figure out how to make it faster. And if studying big O notation helps you do that, great. And if you never learn it, also great. I suck at this to this day. Hasn't heard me. But let's talk about some things you actually will see. Traveling salesman problems. So any kind of thing that involves routing. If you're routing packets, if you're routing trucks, if you're routing airplanes, imagine all of these dots are cities, right? And you're trying to pick the shortest distance to visit all of these cities exactly once and come back where you started. Like a million different types of routing fall into a kind of problem that is roughly mathematically like this. Because remember, when mathematicians and academics study problems, they're looking for generic solutions, generic algorithms to solve every possible flavor of this problem, right? But you're not. You're solving a specific problem. But there's a whole bunch of problems that are roughly like this. Another one is knapsack problems. In the knapsack problem, you've got a backpack and you can put a certain amount of weight in it. And you've got a bunch of books that are all different values. And you want to maximize the value you put in there and stay under the weight limit. OK, let's make that a little more real world. Imagine you have a bunch of knapsacks that are different sizes and a bunch of different books to put in them and do that same thing. Well, now that's a bin packing problem. So if you have a bunch of different containers and you need to run them on different size worker nodes, that's what this is. It's that kind of problem. And there's a ton of different real world problems that are flavors of this. So generically, this is where you'll hear about P versus NP, NP-hard and NP-complete type problems. You can go read the Wikipedia or do something else. You can read about the real math of what that means. I'm not going to try to explain it. But those are the keywords that will get you to the start of it if you really care. This is an extremely well-studied area of computer science, as some of you I'm sure well know. And most of this stuff is unsolved. In the general case, if you can solve those, you can win the Millennium Prize and good on you. But I bet you that's not what you're doing at your job. So my advice to you, if you need to solve a problem like this, is three things. Copy, steal, and cheat. So like I said, all of these problems are very well studied academically. And they've learned a lot of stuff. They haven't solved the general case, but for constrained versions. For versions that are not the general case, some of those constrained versions are solved. There are algorithms you can copy that will give you an optimal solution in the right case. For a lot more constrained cases, there's an approximate solution. They will get you within 2% or 3% of optimal, and you can just copy that. Like 2% or 3% is probably good enough for most of the things you need in this area. The way that you do that is a people problem. It's requirements gathering. This is an interview thing I can tell you for sure because I did it. If you come to Metta and I give you a systems design interview, the first thing I'm going to do is give you a problem to solve with very little information. And the next thing you are supposed to do is ask me more questions to refine that and figure out, effectively, what is the least you can build and still give me the thing that I want by gathering more requirements? This is a critical, non-algorithmic, non-math skill for you to have. Figure out the problem that actually needs solving, identify more constraints, and make something that someone describes to you as a hard problem into a much easier problem. Because you don't have to solve the problem I ask. You have to solve the easiest problem you can convince me will actually solve my problem. Or they will actually solve my problem. But you can't make it up. It does have to actually solve my problem. I'm going to find out. But I often don't know my problem very well either. So we mentioned speed versus completeness and good, fast, and cheap. There's one more of these sort of triangular trade-offs that I wanted to touch on just in case you haven't seen it, which is cap theorem. And this applies to data stores, typically, or transactional systems. Consistency, availability, and partition tolerance. Quickly, in case you haven't seen it, you have to pick two, and you can't. You must pick partition tolerance. Why? So if you have a system with more than one node, they can lose communication. They just can. That's the reality bit. You have to decide what to do in the case that that happens. Your choices are serve errors in order to not serve potentially stale data, so give up availability to preserve consistency. Or you can potentially serve stale data and maintain availability and give up consistency. But you have to have a way to deal with that partition tolerance issue. Or you can choose not to handle partition tolerance, which means if anything ever goes wrong, the whole thing goes to shit, and you lose all three. Those are your choices, so pick you must. So let's talk about a real data store. Databases and stream processing. We talked about this a little bit. In a traditional database, you're bringing together a bunch of data storing in a database, and you do queries. And so all of the data is fixed in one place. Yes, they're sharding or whatever. But roughly, you have static data, and you pass a query over it. This is very much in the sense of that imperative or command and control model we talked about before, which is to say that ad hoc querying becomes easy. Like, yes, the data could be changing, but you have locks, or you have views, or whatever you have. You have a fixed view of the data. You can run this query, and be like, ah, I didn't like that. I'm going to iterate on it, and run the query again, and it's kind of easy. But it might be slow. It might have a bunch of other problems. But what happens when you need to make different trade-offs and you do stream processing? Well, this took me a little while to think about intuitively. As I was working on logging systems, this is the complete opposite of that. The query is fixed in place. You have to know that ahead of time because your data is passing through a pipeline, and it's passing through the query. So what happens when all the data is passed through the query? Well, you get your response, but if you're doing ad hoc queries and you want to iterate on that, like, the data's already gone. Like, maybe there's a way you can replay that data back through the pipeline, but it's not quite the same as the first time it went through, so like, how do you know? You don't have that fixed view that you could use in that reporting. But it might be a heck of a lot faster, like, in many cases. So let's think about these trade-offs, right? I talked about imperative versus declarative. Speed versus completeness turns out to be really funny because of time. When the query's fixed and the data's moving, what time did you run the query at? How long does the query take? Like, well, that depends on a lot of things that we would need a more clear example to go into, but it's not one single answer that's easy to intuit. Does that matter? I don't know, it depends. Like, who are you showing this query to? And this is at small scale, right? This is just looking at the data stores. So let's take this and drop it onto some real infrastructure. So this is roughly a logging system that's pretty generic. It's, you know, similar to what we do at Meadow. People do a lot of places. So you generate logs on the left there and add on a bunch of different nodes. We have a couple different logs coming together and they get aggregated through one or multiple layers of fan in. They go into some temporary data store and then they get fan back out, charting, bucketing, something, maybe multiple layers of that and then read by different systems, maybe multiple systems. Long-term storage could be a database that we query, stream processing that we run queries on, stuff like that. Okay, so let's put some actual data in here. Simple as possible log that I could describe as an example. Like, we're gonna write the numbers one through 10 as individual log messages in here and then try to get that back out to the other side. It should be very obvious what we want to come back out to the, back out the other side, just what we wrote. And that can fail in a bunch of different ways for different reasons. So let's talk about it. First, data duplication, right? We get more than one, two in that first example. How does that happen? Well, it could be a bunch of reasons, but maybe that thing got written from one hop to the next but the acknowledgement failed. And so we sent it again in order to maintain reliability so as not to lose data. And we ended up with two of them. Like, different things with buffering because there's probably buffering at all these different layers too, in various different ways, could be buffering in memory, buffering on disk, like, lots of things. So we could get duplication. We could get things out of order. That's really easy to happen. Oh, we've glossed over one thing here. We're assuming that we wrote all this stuff in order from the beginning, but if this is log A, it's being written from two different nodes to those two nodes actually have global consistency for the time that they're writing stuff. They might write them out of order from our perspective. Like, but let's assume that they don't. Like, let's assume it went in the way that we want it to. We might still get things out of order. Like, let's say something buffers along the way and different things buffer on different machines as they're getting aggregated and the buffers fill up and then get flushed at different rates. And we end up with things out of order. There's, again, a million ways this can happen. Okay. And finally, maybe we lose some stuff, right? The nine is missing. When we know what we put in, it's really obvious that the nine is missing, but if you don't know what logs you expect, how do you know what's missing? Like, how do you even know anything is missing? How would you tell? And does it matter? Or how much does it matter? So wait, where's our trade-offs there? So, and then time. Like I said, time makes this way harder. So we're missing data here in our stream processing query, but as we talked about, we have this time range when our data passed through. Well, did we really lose that nine? Maybe we did. Maybe it was buffered to disk somewhere and then that machine exploded and it's just gone forever. Like, it could be that. Maybe it buffered to disk and it lost network connectivity and got delayed. And it got delayed until it gets fixed by an hour or a day or a month or like, how long is your repair cycle? I don't know. So does that mean it was lost? I don't know. It depends on your requirements, right? But if that data shows up an hour later and that streaming query has already completed, it thinks that data is gone, but maybe it's not. And I don't know if that matters or not, but you better. So the way this tends to play out in the trade-off that's more unique to a logging system is at least once versus at most once. A bunch of these different tuning trade-offs that you can make at these different layers with buffering reliability come down to what's worse, duplication or missing data. It is possible that you could have a situation where data duplication is really, really bad. I think that's less common, but let's say that it's data where you can detect the loss because you're matching it up to some other data source and you can be like, well, we know when stuff is missing and we can replay it. So we can say, oh, okay, something missing, we'll try again and we'll send it again. If that's true for you, maybe you want at most once, like prevent duplication. Probably, in a lot of cases, you want at least once. You would rather have duplication because losing data is bad. So then what do you do about that? Like you know you're gonna have duplication, you've made the choice to do that. Well, what you're probably gonna do is do some post-processing. You're actually gonna pay the cost of additional compute in order to do deduplication after the fact to get rid of your duplication and give you something that approximates exactly once, right? Like you make your trade-off, you pay the cost and then you get the semantics you want. But what about, well, okay, let's say one more thing about scale. We talked a little bit about buffers. So these messages are really, really short. So what about latency? Like when you have really, really short messages, the data is small compared to the overhead of sending the message. So if you want very low latency, you send the messages individually so that they go out faster and you can get extremely low end-to-end latency even with a bunch of hops in a well-designed system. You know, I don't know, 50 milliseconds, 100 milliseconds, something like that. But if this is the logging system at Meta, you need extremely high aggregate throughput. You need to move a bunch of data. Most messages aren't that small, but some are. And so you do chunking, right? You have a buffer and you also use those buffers, not just for reliability. Well, we're only gonna send the message forward every 500 milliseconds or every second. And then we can chunk together a bunch of those tiny messages, like in one bundle, and then we can send things faster with lower overhead. This has a massive effect on overall throughput. But at the cost of, on our system, end-to-end latency is more like one second or two seconds instead of tens or hundreds of milliseconds. Like for us, that trade-off is okay because we need massive throughput and we'll take a little latency to get it. But yeah, exactly once. What happens when you really do want that thing like what they wanted with Spanner? Well, for a super high-volume logging system, probably not what you want. But in some cases, you do. So what do we do about it? So for that, we go back to our Wikipedia diagrams. I need to tell you about the two generals problem. So imagine these things, A and B are armies. A1 and A2 are pieces of our A army and then B is the opposing army and A wants to attack B. But the people commanding those two bits of the A army need to coordinate. They need to attack at the same time or they will fail. So they need to pass messages to each other but they can only do that by passing messages through enemy territory. And so those messengers might be killed, manipulated, affected in some way or our messaging is unreliable. There's a whole bunch of math that goes into this and what you find out is you can send more messages and acknowledge them and send acknowledgements back and forth and more and more and more and no matter what you do, there's always the chance that the last acknowledgement doesn't make it back and there's some small risk that your attack will not be coordinated. This is an unsolved problem, potentially an unsolvable problem of coordination. Like you cannot be perfectly sure. And this is where we get into consensus and consensus algorithms distributed locking and leader election are typically the places where you need this, which applies to a bunch of different stuff. We saw this with Spanner, Spanner does this. And if you have this type of problem, like we said, you provably cannot be completely sure. However, you can provably know the risk. So there are algorithms, there are methods you can take that effectively give you a set of trade-offs to say for different amounts of resources you're willing to spend, messengers you're willing to put at risk, you can have a known level of confidence in how and whether or not you will be coordinated or you will run into a problem. So there are reasonable approximations, things for you to copy. The first of these is called Paxos. Paxos is a family of algorithms. If you haven't heard of this, this is what Spanner uses. A lot of other things use this. Like they do a very good job. They are mathematically proven not to solve the general case, but for the approximations and for giving you a set of trade-offs that are reasonable. You should not reinvent this, you should use it. However, there's also, and has become popular in the last several years, an algorithm called Raft that does almost the same thing. It does it in a slightly different way, but it has almost all the same mathematical proofs. Algorithmically, it solves the same problem in roughly the same way. So why? Why does this exist? Well, because of this trade-off we talked about before, the algorithm was not the problem, it was a people problem. Paxos is really fricking hard to understand. It just is. And people were tired of that. And they were like, well, we want something that's easier to reason about. And so one of the design goals for Raft was to solve the same problem in a way that was easier for humans to understand. This is what EtsyD, if anybody used Kubernetes, is based on Raft, lots of other things are too. So remember, you get to pick both of these problems to work on, and sometimes it matters. I'm not espousing reinventing the wheel, but in this case, they had a good reason and arguably actually hit it. So good for that. So now what? Okay, what should we learn from this? Entropy is not optional, right? You can build your system ignoring the fact that it has to run in the real world, and it will tend towards chaos. Or you can design your system to inherently attempt to deal with reality, mitigate entropy, and potentially you can hold back some of the chaos, most of the time. Sometimes you'll still get woken up in the middle of the night, but hey, we need jobs, right? But remember, you're solving a practical problem, not a theoretical problem. If you wanna go work at a university and do that, great. But you don't have to solve the problem you're given, but you do have to solve the actual problem that someone has and not what you wish the problem was, or the thing that lets you work on the cool tool that you wanna work on. But you get to ask questions, like how good is good enough? How fast is fast enough? You get to gather more requirements and constraints and turn what someone presents as an intractable problem, which maybe you now know how to recognize, into something that's more constrained and more tractable, and then you get to cheat. You don't have to tell them that's how you did it, but you get to cheat. And hopefully you understand why this is my breath first approach to learning this. This is not what they will teach you in engineering school. But the two things I will suggest to you, if you're taking this approach like I did, is seek out interesting problems. I would never have gone into depth in any of these problems until I had something tangible and some real problem that I cared about solving to apply it to. It was the only way I was able to get interested in studying and going into that depth. So find interesting problems and then we'll see where they lead you. And also find interesting, smart people that know about the stuff to surround yourself with if you can. I've been really fortunate to be able to do that. When you have imposter syndrome, it's hard. It makes you wanna pull away from those people that you think are smarter than you, but they have a lot to teach you, so you need them around. And don't try to remember details. This is why we did that Wikipedia-level version of this. You don't need to know the details if you don't have to solve that problem right now. You just need to remember the pattern so that you know how to go deeper into it if you actually need to solve that problem so you can find what to copy and what to steal from. And I did go to theater school, so I'm going to finish on a theater topic. I wanna tell you about the word haemarsha. This is a term that comes from studying Greek theater. If you've ever felt like the protagonist in a Greek tragedy, hopefully that doesn't happen too often in your work, but maybe sometimes. But haemarsha describes a fatal mistake, a fatal flaw that leads to the downfall of our protagonist and the classic fatal flaw is hubris, it's ego, pride. This is also a classic fatal flaw for engineers, sometimes fatal in the literal sense, maybe not so much for computing, but sometimes this is real, right? There is a temptation, we're too clever for our own good, we fall in love with our tools, we fall in love with a certain problem and decide we need to apply that solution to everything. We don't revisit our assumptions, or worse than that, and I see this all the time, we constantly talk about iteration, we talk about learning and building minimum viable products and then what happens over and over is someone builds a minimum viable product, they build version zero of something and then what they should do is build version one and version two and learn and revisit their assumptions and then they don't, they throw it all away and they build another version zero of the same thing and say I'm gonna get it right this time, I'm gonna be smart, I'm gonna know it all. That is hubris, that is a fatal flaw, that will lead you to your downfall, don't be that guy. That is all I have for you, thank you very much for being here. Don't get questions? Let me see how we're doing on time. We have a few minutes. No, 10 minutes, perfect. If we have questions, I'll run, Mike's to you, please don't ask without Mike so we get it in the recording. Hi, so you mentioned about the compromise between security, you mentioned about the compromise between security and usability. Did you have any situation in Meta or any other jobs in which proposed security methods, let's say, or techniques didn't went well in production? Of course I would get that question first. Yes, I'm sure I have a lot of examples of this, I don't have one in mind and so I don't want to just riff on it on something like security. I don't want you to take that to think that I don't care about either security or usability because I do. What I will say is often this one again comes down to a people problem, it comes down to different people with very different agendas and very different things, different problems they're trying to solve and especially for production engineers, at least at Meta, a big part of what we're doing is taking the bigger picture of figuring out what to do about security engineers that are only thinking about security and not thinking about some of the non-security related consequences of their choices and somebody else who cares about the service and probably isn't an expert in the security and I'm an expert in neither and somehow I need to try to get what's best for both of those people. There's a lot of room for that, it's a hard skill to have come be a PE, we're awesome. You said that, what was it, the PAXOS and RAFT algorithms have a trade-off? What is that trade-off? I think what I said is, PAXOS and RAFT implement a trade-off, right? The trade-off is that you can't have absolute global consistency, at least like the way the two generals problem describes it, like there's always a risk of something happening but we can quantify that risk, we can say, we can make a trade-off between how many resources we wanna expend, if we need to reduce that risk more, we can pay more costs, right? In the analogy, we put more messengers at risk, we reduce the risk by a known amount but that's a high price to pay and for some applications, maybe that's not worth it, right? Maybe we don't need that level of assurance but we need to quantify that or else we don't know. Okay, I'm kind of riffing here so we'll see how this goes but kind of pushing back on what you were saying before about getting the security people to care about usability and the other people to care about whatever. I feel like whenever you're having a conversation in a big group of people, everybody comes with a particular perspective and sometimes it's a matter of kind of unlocking them to think about a particular view and then sometimes it's about getting them to understand something that they don't actually know quite yet. Do you have any tips or strategies or tactics, whatever, in dealing with a large group of people in its self-distributed system and trying to get them to kind of some sort of middle ground in a way that's approachable and good, I guess. Does that make sense? It does. I think that that is its own whole talk. Yes. In the context of this talk, I think that illustrates the point of people problems and how important people problems are because there's nothing math related about that but it's critical to get that right to actually solving the practical problem you're trying to solve for a business or an organization or whatever it is. I have an idea for a talk that's focused on security where I would have to address that head on and so if I get a chance to do it then hopefully I will have something better to say about that. Cool. See you next year then. But also I'm not a security engineer so I don't know if I can give that talk with any authority but maybe. I'll make a call back to something you said in an earlier slide which is requirements gathering is really useful in those conversations. Anyone else, go in once. Going twice and so thank you all. Oh, do we have a question? We do. One more question. Okay, last one. So I wanted to ask about when you were talking about speed versus completeness. So what is like an example where you, like something like that happens, like you know, when do you pick speed and when do you put, like, what happens to the completeness? Like, you know, do you verify less or something to make sure there's speed? Yeah, so the example I gave about monitoring system is very much like this and we have this, right? So we have a very fast query engine that we use for monitoring, not just monitoring but ad hoc queries for debugging a lot of things. And not to get into the implementation but effectively a lot of those queries you will often get back results based on only 80, 90, 95% of the data that should be queried for completeness. And it'll tell you this and it has an idea of how complete it was but like some of those things weren't fast enough or were failing or whatever and so we just ignored that and then gave the results from what came back. Which means, you know, your answer could be off by 10%. And for a monitoring system, what I care is that I can query it in seconds, right? Whether it's for a monitor or for debugging, right? It keeps that fast at the cost of completeness so that I can quickly do ad hoc debugging and figure out what's happening during an incident and move quickly and that completeness is rarely gonna stop you. It quantifies it so you can see and if you're like, no, I really need it then there's things you can do and go to other data stores to get completeness. But you typically don't need it but we don't use that for financial reporting or for something where the completeness mattered and we had time to wait. Thank you so much, Casey. Thank you. Does everybody hear? Yes. Okay. Sorry, was having some technical issues with the laptop. And I guess we're pretty much on time. So, I'm Michael Gatt. It's good to be back at scale after several years and gonna be talking today about technology cost management, which is a topic I was first asked to speak about for a meetup group about six months ago and this presentation has rolled around through a few different iterations until we finally got to the one that you're gonna see here today. It's the last talk on Saturday. I'll try to keep it a little loose and casual and hopefully you'll all learn something. Maybe I'll learn something. But I'd like to start with a story. This is a story about cats on keyboards. It goes back to my first job many years ago. I was working for a large bank and I was part of a startup division of a bank that had taken some software I had written in a previous role and were applying it to something new. They brought over a banker from England who was going to run this startup division. We had a great launch. After our launch, we acquired a couple of big customers. We had a demo for a bunch of customers. They loved it. After that, we gathered and my, well, I guess he was a general manager of this division, told us we had done a great job. But at the end, he kind of gave us what at the time felt like a bit of a backhanded comment or a compliment, which was that you all have to remember. These customers love what we've shown them. They love our technology. They love the fact that we've mastered this technology that we're doing things that nobody else can do to serve their needs. But keep in mind that if somebody else can do the exact same thing by having a million cats, walk over a million keyboards and generate the same result for less money, they're gonna go with that competitor. And at the time, that kind of hurt. And as a technologist, it still kind of hurts, but it's true. We exist generally in the service of businesses. Those businesses have to make money. One of the ways they make money is by being careful about how they spend their money. And that's what this talk is about today. Key takeaway, that top bullet point is, engineering is where science meets economics. I don't know who first said this. I can't claim credit for it myself, but I do think it's true. As engineers, we learn a lot of science, but it gets really powerful when we are able to apply it in a situation where the economics work. Otherwise, we'd all be pure researchers off doing something else. So a little bit about me. I told you about my first job. I'm most recently a technical program manager from AWS, which I left last year. I've been an IT consultant. I've worked with open source. I've been a scale volunteer for many years this year. In addition to speaking here, I'm running the observability track. And what else? I'm originally from New York. I serve the needs of two cats. One of whom you just saw, that was Flash. He's named after a memory stick. Not kidding. There's a story behind that. Why talk about anti-patterns? Well, because patterns tend to matter. If you were here for Casey's talk just a few minutes ago, he talked a lot about don't worry about the detail. Recognize the patterns. Same is true with dealing with cost and giving specific proactive advice on any general topic is really, really tough to do. So I'm giving advice on what not to do. Because I've found that if you avoid doing all the wrong things, you'll constrain your choices to things that are at least somewhat right. And that gets you, depending on the circumstances, anywhere between 50 and 80% to where you need to be. And besides, it's just easier. This was actually intended to be a joke, but it really isn't. It's proactive advice is hard to give on a general topic. Given that everybody has slightly different needs. There's a bunch of things I won't talk about. The biggest one that you tend to find that comes up a lot in costing in companies these days is forecasting and planning. That's just a huge topic that could be a presentation on its own, I won't get into it. So overall framework, there are two big buckets of spend for any tech org. It's the people and the infrastructure. Usually the people are the biggest piece of it. But Conway's law teaches us that our people and our systems tend to mirror each other. So the thing that I'd encourage you to keep in mind as we're going through all of this is for every one of these anti-patterns in infrastructure, there's a parallel anti-pattern in what your organizations are going to look like or not look like. And I'll go through various scales or various dimensions, the scale of the enterprise, strategic plans, architectural issues, and lots of operational and tactical stuff that we'll do. And the 10 points I'll go through it, through we'll kind of hit all of these starting in roughly that order. So my top 10 used to be 12. Scale of the company is someplace you've got to start. I'm sure in this room we have everything from small startup to massive hyperscalers. They cannot look at costs the same way. One of the things I will say more than once in this presentation is you are not Google. Well, some of the people in this room might be. But most companies are not Google. Most companies can't do the kinds of things I used to do at AWS where you would get a half percent gain, wow, we just saved $200 million. When you can save $200 million for half a percent, you can spend a lot on efficiencies. A lot of my old consulting clients here in LA would not save enough from an efficiency or a cost management project to justify the project. Let alone doing the kind of ongoing continual improvement we did at AWS where you needed a full-time team for it. So a lot of companies, and this is not unique to cost, tend to jump in and decide we're gonna do that because we heard AWS does it, bad idea. Bad cloud strategy. Most of us are going to be in the cloud to some degree today, unless your initials are DHH. Some of you are familiar with that person, in which case you spend most of your time ranting online about why the cloud is awful and how great you are for moving away from it. This is David Henmeyer Hansen of, well, he's the inventor of Ruby on Rails, among other things. The cloud will generally not save you money and I think that's the biggest thing to remember when you're coming up with a cloud strategy. Over the very long haul, it may, but the savings will be a bit squishy in the sense that what it really allows you to do is move a lot faster and do more for the same money. It's not that you will save money, in many cases you will spend more. There's no such thing as cloud neutrality. Forget about it. I've included a link to, in my slides, which I'll share with you, excuse me, to just a very simple demo of, I'm going to move a simple containerized app from AWS to GCP. Some of you have tried this. It's not easy and there's relatively little benefit to it. So a lot of companies constrain themselves on, well we don't want to be too tightly tied to one cloud. That's not something you should do. Don't worry about being tied to one cloud. You probably are, whether you believe it or not. But which one you choose might matter depending on what they charge and what services they offer for what you want to do. One of the bad things we tend to see, and this is typically true of older companies that have migrated to the cloud, is a lift and shift where you kind of treat the cloud as your data center. And you can gain some efficiencies there, especially by scaling things up and down that you can't do in a data center, but you're really not gaining the biggest benefits. And you need to be careful about how you do cost analysis. One thing I've learned through many years of moving stuff to the cloud is if you want to gain the benefits, you probably need to re-architect around what the cloud allows you to do. You won't just move your architecture. One of these cost analysis that I linked to here was a company called A-Refs, I'd never heard of them, but they are apparently a huge SEO company based out of Singapore. They run a massive web crawler and they published a paper where they said, oh, over three years we've saved $900 million by not going to AWS. Well, turns out two things are true in that case. First of all, they violate the You Are Not Google rule because they run the third largest web crawler on the planet right after Google and Bing. So they have issues that most of us don't have. The second is they were looking at, well, what would it take to move our precise architecture based on lots of fairly intricate hardware decisions to AWS? And that's not how you would ever do it. If they had started in the cloud, they would have a different architecture that would have completely different cost drivers. Another one we see a lot is no ability to assign costs well. And this is unfortunate because if I had gone back 50 years to the mainframe era, some of you have heard of it, assigning costs was fairly trivial because when IBM invented the mainframe, their whole concept of how computing would work was there would be a bunch of these big things and you would slice them up and rent them out to people a little at a time. So tracking who was using what is embedded into every level of how an IBM mainframe works. I'm not old enough that I worked on mainframes, but I'm old enough and spent enough time in financial services that I've been around them and we're still catching up to things that IBM could do on mainframes in 1960 in terms of tracking who's doing what and what the costs related to it are. It's especially hard in some organizations today where you have multiple layers. Yes, you have a consumer-facing application, but you'll have some sort of platform below that and maybe you have some database platforms and off-site storage and all of these interact with each other. So who actually owns the cost of that platform that runs a database, that also runs a Kubernetes cluster that your application runs on that uses the database? How do you track all of these things? It's a hard problem. Most companies don't do it well and when I talk to AWS Cost Consultants people who do this professionally, they say that's usually where we have to start, just having that conversation about what are the costs we're gonna assign to what pieces of the business. It's usually easier when you use a lot more cloud-based services, although there's a lot more stuff going on on-prem that has borrowed from things that the cloud has done and a lot of open-source tools these days that make that a lot easier or are beginning to. We have horrible metrics. We usually have great technical metrics. In fact, we have more technical metrics than we generally can use. I like to refer back to this definition of what a metric is. It's a quantitative measurement that provides insight into inputs and outputs of a process. It's quantitative, that means you can compare it, ideally all the way across your organization. It is a measurement, which is to say it is generated automatically. It comes out of some system that is reliably and consistently capturing that number. It's not something that somebody typed into a spreadsheet. It provides insight. One thing about costing is there's no truth. There's insights. There's directional information. Truth in cost accounting doesn't exist. I'll talk a bit more about that later. And it's based on inputs and outputs. Typically with cost, it's dollars are the input. We put money in and we get something out. Very often when I'm talking to executives, they'll be saying things like, no, we just need to reduce our costs. And I'll say, no, we need to reduce our costs within certain constraints because it's costs per something. And the something may vary, but it is cost per something. And I'll get some pushback on that. And the go-to answer that I've been using has been, well, if it's not cost per something, then I can reduce your costs to zero. Give me the administrator login to your cloud account. I'll go in, I'll delete it, and you'll have no costs. And then, of course, the answer is, yeah, but that's not what I meant. Okay, so what did you mean? It's cost per something. What is the something that you want to come out? And that's where the interesting conversations start. Sorry, I advanced myself. We've gotta be careful with metrics, especially with costs, because we're engineers and we game everything. I was listening to a presentation yesterday about how, well, actually it was Forrest Brazil's talk late last night where he talked about how, oh yeah, somebody said we need to be running at 80% utilization, so they just inserted lots of weights into code to always hit 80% utilization. And that sounds ridiculous, but I know of a situation where a company had a nasty habit of periodically telling everybody, cut 10%. And, well, you can imagine how that worked out. I'll talk about that a little bit more later as well. There's a bunch of stuff here you can read. I won't bother going through all the details of what Deming didn't say that you believe he said, what Drucker didn't say, what you believe he said. What I will say about metrics and management is really don't measure what you don't want to be forced to manage because once you measure it, you probably will be forced to manage it, even if that was never the intent. OKRs are great, if done right, because they set an objective and let the people who are actually working the problem figure out what should the target be. Another anti-pattern, this one is still, well, now we move into architectural, those tend to be, the previous ones were pretty strategic, but at an architectural level, sometimes we get the opportunity to design things right from the top or right from a green field. And we should think about cost when we do that. Yesterday I had the opportunity to hear Corey Quinn, who's an AWS Cost Consultant, speak about something completely different, but one of the questions that was asked was, as a cloud economist, how do you recommend people design this stuff? And his answer was, you've got to design something to work. Don't spend all your time up front thinking about how much is it going to cost because you don't know how it'll evolve. But going back to the IBM example, think about what you might want to collect in the future and make sure you're considering that. Think about what the right architectures are given the state of the world today. 10 years ago I would not have been telling anybody that their default choice for an architecture should be containers and serverless, but today I probably would if that's where you're starting. It may not be where you end up five years down the line and you might want to think about an architecture that allows you to make that move as you grow to the point where serverless starts getting really expensive if you get there, but also keep in mind you might not get there, serverless might remain the right place forever. So think about that. And there are lots of open source alternatives to help optimize this these days. Cost management as a standalone, this is a topic near and dear to my heart because as I say at the bottom, it's burnout inducing and leads nowhere. I know this personally. I spent a few months at a company during a hiatus from AWS where they asked me to basically do that. And it is a horrible, horrible thing to do. We're shifting everything left. And that is including cost. So if you set up this separate standalone piece of the organization that is going to manage our costs for us, but oh, by the way, they don't write any code. They don't own any applications. They don't run any systems. They don't manage the platform, but they're gonna solve all the cost problems. Well, that doesn't work well. Their canon should be an independent team to make sure that there is consistency across the organization, to make sure that metrics are captured, that things are highlighted when they should be. Today we tend to refer to that role as Phenops. But it can't just be, oh, you're going to solve it, go off and solve it, come back later. It was a horrible experience. Ongoing review, or lack thereof, we don't do this well. Even companies that try to do this well don't do this well. As I said, the point is to shift left. Shifting left means more sharing of information quicker. A lot of companies don't want to share cost data. In some cases, the excuse tends to be, oh, that would share something proprietary. Well, yes and no. AWS costs or any cloud's baseline costs are not proprietary. You can look them up on a website. Now, you probably are under a legal obligation to not broadly disclose any discounts, but that doesn't mean you can't give the baseline, which is, if not perfect, because discounts can vary across products, or at least directional and provide feedback to the engineers who you are moving more and more of this responsibility onto. So that's a fail. If you're not willing to share cost data, most of this will not work in a fast-paced, again, shift left environment. If you have the right metrics and you're looking at them and you're reviewing your cost dashboard as regularly as you review your operational dashboard, you will see things, you'll see them quickly, and you'll be able to correct them quickly. You'll also usually be able to attribute the change to something you've done recently. I noticed that recently just in my own AWS bill, all of a sudden my S3 tripled. Well, it turns out I changed something in my internal backup storage that caused a whole bunch of information, well, a whole bunch of system folders in Windows from every single one of my machines to be copied to my NAS, which is then replicated over to S3, which meant that every single day I was replicating an additional, oh, 20,000 files. So, yeah, that got expensive for a few days, but because I was reviewing it, I could figure it out very quickly, and I knew exactly what I had done. It wasn't that far in the past. Important questions, what can we turn off? As a former AWS employee, I believe strongly that if companies actually reviewed and figured out all the logs they're keeping and how long they're keeping them for, S3 profitability could probably be crushed because you would just eliminate, you might eliminate 30% of global data storage just by getting rid of Justin's laughing. At least 30%, or maybe it's the opposite. Maybe you'd keep 30% and eliminate 70% of global data. Just by not keeping logs that you don't use, don't remember you have, but are still paying for. And yeah, what can we turn off? What can we not store? And again, I'm not gonna talk about actual versus budget forecasting. That's a huge part of what FinOps organizations are doing these days. Again, to quote Corey Quinn from yesterday, he basically said, no, most of the companies who are talking to me aren't looking for cost reduction. They're looking for cost understanding and predictability. That's where you get to once you can figure out what your costs are in the first place. Another one I see is no reviews of what you have and how you could change it. I've seen companies that are still running instance types. I'm not sure about the other cloud providers, but AWS never absolutes an instance type. If you set it up four years ago and you still ask for that instance, you will get it. And you will get the cost and performance of five years ago, as opposed to today. But you don't always want the latest one because if you look at some of the most recent ones they've put out, they've actually started getting more expensive. Historically, you could always move to the next generation and it would be more performance for less or more performance for the same. Now they're increasing the prices on the 7Gs. Is it better? You won't know until you benchmark it. One of the things I learned from the few months I spent at Stripe was they did a really, really good job of benchmarking. They would not put a new instance type into service or a new instance family into service until they compared it to what they were already using and understood mostly at a performance level but also at a cost level, what are they getting? What do they have to change when they go to that new instance type? Can they use fewer VCPUs, less RAM? They knew and it's one of the things I thought they did really well. You know, there are new storage capabilities coming along all the time. Huge number of companies haven't moved to intelligent tiering in S3 as their default which again, I would not have thought five years ago that I'd be recommending a default other than standard but today I am. It's a starting point. It's not where you will probably end up for most of your data but understanding what that gives you it's a relatively new change. It's only a couple of years and there are new ones all the time and prices change all the time. So review and benchmark matters. You might be able to reduce data transfer fees. That's what I spent three years of my time at AWS working on was basically EC2 networking, data transfer, private link, API gateway. All of those are new things. They are things that change. You need to benchmark. You need to understand them. You need to understand the cost impacts of them. And again, I haven't talked much about cloud, on-prem, buy versus build. These are also things you should be looking at. And these reviews tend to be different from regular cost reviews. These are more architectural, more done by engineering teams. It's why I've split out the two slides. But this is still something you need to do regularly, maybe not as regularly. This is maybe more of a quarterly or biannual type review. Across-the-board targets suck. There's no other word for it. I was a victim of this. Usually it reflects a management that is suddenly under pressure. It doesn't really know where their costs are coming from. So they just will say, everybody costs 20%. It's bad. Like quote Corey Quinn on this one, where it hits different teams differently, Corey says. For one team, it's, we'll cancel a couple of projects for another team. I guess we'll all take a pay cut and work two to a laptop. Slightly exaggerated view or maybe not, but it hits different. One of the reasons for that is it particularly hits teams that have been doing the right thing. If you've been running lean and completely optimizing your spend and team down the hall has been running extremely bloated for the past two years. When cut 20% comes down from above, the team that's been running super bloated is the team that can meet that target tomorrow. Again, this creates those perverse incentives I talked about earlier. I know of at least one team where in a company where this kind of thing was common, they knew if we've got a down quarter, everybody's gonna be told cut something. And there was a system there. I won't mention the company. It's a long time ago, but I still won't mention it. There was a system that was just constantly, continuously calculating pie to the millionth digit. And it could be triggered by feature flag. It could be dialed up and down as to how many parallel threads were working on this problem 24 hours a day because when that order came down, you wanted to be able to quickly over the course of four weeks dial down your costs and declare victory and then slowly dial them back up as things that the company got better and people were paying less attention so you could be ready the next time. You don't want to encourage that kind of behavior. A lot of things that go on around cost, and this is where some of those quotes from Drucker and Deming Matter are really about motivating behavior. The wrong patterns or the wrong anti-patterns really motivate the wrong behavior. At the very least, it encourages back pocketing. These are the situations where, no, you're not gonna run a bad routine to just soak up CPU, but maybe you've got some improvements and you hide them behind a feature flag and wait until someone asks you to do it. You'll still do it, but you'll go six months before something that could have been put in production actually is. Don't do that. Regular review, regular ongoing understanding of the costs is a better way. Finally, tools. I'm a strong believer in the simple truth that if you can't do it manually, a tool will never save you. Tools are great. They allow you to do things at massive scale that you would not be able to do if you didn't have that tool. But I used to be a project manager. I would tell people, no, throw away MS project or whatever, or Jira or whatever you're using. Here's a bunch of post-it notes. Show me that you can run a project on a whiteboard with post-it notes. I still believe that's true. If you don't get those basics, the tools won't help you. When it comes to especially cloud costing, most of the tools are immature. The space is still developing. I'm seeing a lot that look promising, but nothing I can strongly recommend right now for the general case here and there for specific use cases. Yeah, there are good things out there and there will certainly be more, I know of at least two in development that I'm really interested in seeing. Hopefully they'll let me get pre-production versions of those. But, you know, tools, any tools are gonna require staffing, they'll require changes to your process. Again, going back to that first slide, know how much you can spend, know how much it's worth spending to do all this. The amounts of money that I could spend to save money at AWS are far greater than I will ever see in my life anywhere else. So know what the tools, know what you need to do and make sure you find a tool that does it and does it well. Again, the space will mature a lot more that it'll be easier to figure out than it is right now. And remember, as with any vendors, these salespeople and engineers will not tell you all of the pitfalls, so you've gotta figure them out yourself. And then a bonus. In preparing for this talk, I checked out a bunch of videos online. I took a couple of classes. One of the ones I took was a cloud guru who used to be a great education provider, at least for cloud-related stuff. They're now owned by Pluralsight and have put out what I consider a lot of unshitified content, one of which is the AWS Cost Management Deep Dive, where one of the recommendations was, you should set up rewards programs to reward people for reducing costs. And there's something called the Cobra Effect. It's generally a perverse incentive. It comes from a almost certainly bullshit story by the British colonial powers in India who decided we need to get rid of the cobras in New Delhi. So they put out a reward for every dead cobra that people brought in. So people started breeding cobras to kill them and bring them in. And then supposedly the brilliant British authorities then realized, well, we shouldn't be paying people to do this. So they stopped paying people to do this. And everybody, of course, because I guess they were stupid in the view of the colonialists, everybody just released the cobras in their own neighborhoods and compounded the problem. Almost certainly did not happen that way. But it's something to remember. You encourage people, if you're gonna pay people to cut costs, they'll just bloat their costs and then cut them. And then you'll change your mind and you'll still be left with bloated costs. It introduces something that my friends on Wall Street refer to as IBG-YBG thinking. That's not a term we use a lot in tech, mostly because we don't get so much of our compensation in year-end bonuses. But in this case, they're recommending bonuses. So I thought I'd bring it up because this is how Urban Dictionary defines that term. I think it's about right. This is where it comes from. It's I'll be gone, you'll be gone. If you know you're gonna get a huge bonus for doing something or even a smaller bonus and you're already planning on changing jobs, well, do we really need all those backups? Is that redundancy truly necessary? I've talked to AWS and other cloud cost consultancies. They don't work on a commission basis. There are customers who like this, but they generally won't do it because they understand it sets up perverse incentives. And one of those people has told me straight out, yeah, I could do that. And he said, I would not be trying to screw my customers, but I know where the incentives are. So if I ever do this, it has to be a big enough job for me to retire on because I will completely destroy my own reputation. If you want to work on commission, go into sales. Don't go into cost savings on technology. Okay, let me just look at time. We're kind of getting up there. I was gonna quickly talk about a few things you could do right now. I'll just go through the first couple quickly. Whether your company is attributing costs or not start thinking about how you might do it because you will either figure it out or someone will figure it out for you. It is better that you at least have thought through the problem, especially if you're in one of those weird platform type teams where your costs are coming from someone, three layers of abstraction above you. How are you going to figure out who pays for what piece of your budget? It's something to do. What's that? This is old stuff, I went to the end. That's unrelated. But learn something about cost accounting. It's weird to stand in a tech conference and say you should all be accountants. But I started from the premise that engineering is where economics and science meet and I still think that's true. Casey, who just spoke before me, spoke about the thing that he learned years ago that he didn't think had anything to do with tech that he has now figured out does. And I feel the same way about cost accounting. It's about the only thing I ever learned in business school that still has relevance. And in fact, it's one of the threads that has woven through my career consistently over decades. It's a way of thinking. Cost accounting is not the stuff you use to put together financial statements. As I said, costing is imprecise. This is about a language of talking about who owns what costs, where do we attach them? It's a language that people you work for speak. It helps you to understand it. I'm not suggesting anyone should go and spend a lot of time on this, but if you understand the principles of what is known as activity-based costing, you're so far ahead of anybody else who's probably in the room when you're talking about how much you're spending. The CFO will love you. And think of a lot about where you and your team are spending your time. This isn't directly cost-related, but I think we've become very attached to a lot of tools that claim to make us productive and don't. So think about those. I won't say much about it. Thanks for all the people who've helped me do this, for the various meetup groups who I've presented earlier versions of this too, and everybody at scale, and especially to these two who, well, do literally keep me up at night, but also keep me sane, and that's mattered a lot this past year. I'll take questions. Thank you. I guess we don't have a room person. Yeah, Justin, another former AWS colleague who isn't there anymore. So for on-prem, what software do you recommend? Boy. You know, that is, as I said, the one thing that's really difficult to do is recommend a specific solution to a problem. It depends what you're running, and there are so many variables that go into that. On-prem in a lot of ways is easier because you have a fixed cost to your hardware and the rest of your infrastructure that doesn't change much. I'm just saying right now I use my dad when he gets mad at the electricity bill. That's a good proxy. Oh, electricity bill is a pretty good proxy for what your costs are, but the thing about it is what is driving those costs? What are you running that you need to monitor? Data ingest, because I do data analytics on public transit, so. We pull in half a terabyte every single day, so. That sounds like fun. And it is the cost, in the cloud, I'd be immediately saying, I'll bet there's a whole bunch of network stuff and data transfer that you're not even recognizing is there. I calculated it, it would be like, every single month would be like around $5,000. Yeah, I mean, again, that's almost sort of a, I need to look at your systems and see what you're doing to understand what's driving the costs. How much would those costs really change on a month-to-month basis if you were doing something different? On AWS, it would be exactly the same. On-prem, I get to take, I turned off my NAS, so that's a lot of money saved. Okay. Because hard drives are a lot of electricity, so. Yeah, I mean, again, it depends what systems you're running it on. The tools are just really hard to say. As I said, there are lots of tools out there, which ones are right. It's so situational that without knowing a bit more about it, I really can't answer that. Okay, I'll talk to you after. Yeah, we can talk a bit more. We've got someone here. Oops. Hi, I'm an engineer. But if I'm understanding, in software, the biggest cost is often people, and given the economy, there's all this layoff. So defensively, how can we gauge, I guess, our own presence as a cost to the company and how do we run this on ourselves in the context of our jobs or companies and such? Good question. At the very start, and I don't know if you were here at the very beginning, I pointed out that there are two big buckets of cost in every tech organization. There's the infrastructure and there's the people. And if you're familiar with Conway's law that basically says your system architecture and your organizations will always tend to mirror each other. It's not precise, it's not perfect, but it's a pretty good approximation in my experience. As we're shifting left and engineers are now managing infrastructures as well, well, your infrastructure organization and your people organization are running into each other and paralleling each other much, much, much more. So in terms of managing your career, you want to be focused in the places that can't easily be cut. And if you want to parallel it to infrastructure, what can't go away? As I said, if you're the person who's managing the huge percentage of data out there that is useless logs, that can go away. We'll get back. This is actually related to your point about cost accounting. Yes. So in the companies or organizations that you've been working with, does the chargeback versus showback model work when it comes to transparency and accountability with the cloud teams that you kind of... I'm sorry, the chargeback... Chargeback versus showback. Yes. Which model do you see works better for accountability when it comes to distributing costs among teams? Boy, that's another great, it depends. I'd be happy to take that one offline. There are so many, again, as I said, a lot in costing is really vague and specific accounting procedures. The answer, again, is it's extremely situational. I would say there's not an inherent advantage to one or the other. A lot of the complexity there is in the details of how are you attributing and accruing things. So I have a question, as an engineer. Yes. I've worked on teams where our products were supporting the entire organization, like, let's say, login or databases. How do you separate or attribute either costs or perhaps profits to those kind of platform level organizations or teams? Once again, there is no answer. Historically, the answer for a lot of those things was we'll just consider it overhead. But then you get the question of, well, who gets which portion of the overhead? Because in typically, in any cost accounting, whether it's standard costing or activity-based costing, which tends to be what we use in developing software, you're going to want to attribute out that overhead. And I've seen a bunch of different approaches to it with data platforms. It can be data, or amount of data stored, or it could be, depending on data created or data used, or what's the CPU that goes into pulling out the data. And very often, if you don't do this well, you either undercount or overcount. So for things like logins, you know, a log-in, you're probably not spending enough as a piece of the pie to spend a huge amount of money on it, you'll probably figure out some per log-in type cost. But those are the big questions, and they tend to be very political questions, which is why I said in attribution, the sooner you do this, the better, because that's where a lot of the hardest conversations are gonna happen. I have seen endless debates on just the question I was talking about of, we've got this data platform, and it uses CPU, and it uses storage, and it uses data transfer, and who owns what piece of that, and how do we allocate it out perfectly so that it all sums up to 100%. It's a multi-dimensional problem, and it's difficult. Hey, speaking from, just asking a question as an engineer. Yes. In the past, what I noticed is that a lot of the scaling decisions is usually based on the worst case workload, and in space-life intact, you tend to have very spiky workload in the beginning of the month, the last of the month, right? Yeah. So, are those just kind of inevitable costs in your view, or what's your take on any optimization we can do to keep the bill low, right? Boy, Justin and I were just having a huge conversation about this yesterday. We've been moving more and more to less overhead, targeting the 90% case, not the 60%, or targeting 90% utilization, not 60% utilization, and depending on scaling capabilities and other capabilities that the cloud gives us to bump that up. And that's especially true if you're talking about scheduled events, which I dealt with at AWS. My biggest customer for several years was Amazon Retail. Oops. And, well, we knew what the big days were gonna be. So, we could manually scale. Remember, manual scaling is kind of a dirty word in DevOps circles these days because you want everything to be automated. But if you have set dates, if you have month beginning, month end, if you're in the tax business, mid-April, it's okay in those cases to do it manually. In many cases, it could be better. You will spend less on doing it manually than you will on trying to automate it imperfectly. And as I said, AWS does it. So, if AWS can't be bothered with automating that, you probably can't either. Again, go to that scale question. You probably can't justify that. We've got about five more minutes. Hi, does it work? Yeah. So, I gradually noticed more and more how 20, 25 years ago the big infrastructure providers were like the database like Oracle, for example. And I feel like the circle is now complete because you were talking about don't try to save by being agnostic, being cloud agnostic. And it reminds me of a famous book for 20 years ago by an Oracle VP about don't try to be database agnostic. So, I feel like the circle is now complete and the infrastructure moved from software companies to cloud providers. That is true to a degree. But it's a circle that, it's not even a circle, it's a cycle because I've been in this for a while as I'm sure you have. And I can remember when IBM was telling you, no, just plan around us. And then we moved to, no, you can run it on anything, any PC anywhere, that was what Microsoft told us to do. And then we consolidated around some big software companies for business purposes, at least like Oracle, like SAP. And then we went towards open source and Linux and everybody running their own servers and now we're consolidating into cloud providers. It ebbs and flows, so you're not wrong. I think it's the nature of our business to consolidate things once they hit a certain critical mass. And then we're gonna figure out the next thing and that is not going to be consolidated for many years to come. And eventually it will. And the costs, again, the costs will move up the food chain and the people lower down. I think we have time for more questions. I think we have one more, you know. One of the challenges I ran into relatively recently was trying to help a customer with their Azure bill to the degree of they couldn't even explain their bill. Oh, most people can't explain their bill. Most AWS cost consultants and those of you here at scale may know Corey Quinn, that's his business. He will tell you my business is explaining the indecipherable AWS bill. Understanding it is key. That's why I started from figure out how to attribute your costs, figure out this huge bucket that shows up as an S3 line item and figure out how to split that up. There are lots of tools to do it with and the same is true for Azure. Using different account numbers. This is another mistake people make is run everything in one account. The cloud providers used to tell you to do that. They don't anymore because they realize that's a horrible way to keep track of things. Use tags, use all of these tools they give you to be able to say this thing belongs to that organization. Then you can start maybe not deciphering the bill but at least understanding where the costs are coming from. A huge amount of the complexity in the bill is there are some things that you can just never easily predict like cross zone traffic on a cluster that runs in multiple zones. That's unknowable in advance. For what it's worth you guys talked about logs and it made me think backups was the big surprise for this customer. Like their storage was off the charts and I was like there's no way my project created that storage and when we dug into it, it was their backup policy. Yeah, as I said my personal backup at home tripled my S3 bill because I changed one configuration setting in the backups and I didn't even think about it at the time. Fortunately I review my bill often enough that I saw this happen three days after I made some minor configuration change and it's somewhere where my brain put two and two together and I said, oh that must be a result of this. I better change that setting back. Anything else? I'm happy to talk to people. Yeah, and you'll be around for more questions. Yeah, I'm here. Thank you. Okay.