 Please welcome to the stage Executive Director of the Linux Foundation, Jim Zemlin. Good morning everyone and welcome to our annual member summit. This is my favorite event. Chris Wright, the CTO of Red Hat, likes to describe this event as the hallway track where the entire event is the hallway track. This is one of the funnest crowds to speak in front of because I feel like so many of you, I've known, you know, I'm coming up on my 20th anniversary at the Linux Foundation and so many of you I've known for more than a decade and we've worked together to accomplish so many amazing things. Before I get started I want to make sure I thank our sponsors AWS and OpenSearch. I want to also thank our Gold sponsors, the Cloud Native Computing Foundation and Google. We can't do these events without our sponsors and we really appreciate it. Like I said, this is my favorite event. I feel like in this crowd we can be intimate with each other in talking about things where we may disagree. Imagine in open source that we disagree about certain things. But we can also be comrades and be open with each other in such a meaningful way and continue to grow our communities and build more consensus which is of course the whole point of everything we do. So now let's talk about some things that we all work together on. Things that unite us, right? And one of the fun things about this event is that we get to kind of talk about the art and science and craft of open source, right? This is the PhD level open source crowd here at the Linux Foundation member summit. And today I thought I would talk a little bit about what foundations themselves, organizations like the Linux Foundation and many other open source foundations do. I'd like to talk about what we do, why we do it, and how we do it. How do we create the kind of success that we see in open source? Why do we even bother doing it? And I often get asked how does an organization like the Linux Foundation or how to open source projects define success? And I'd like to offer my thoughts on that. I think the biggest definition of success at the end of the day is impact. How many lives did you impact? How many markets did you change? How many things did you improve in the world? After all, that's why we're all doing what we're doing. Of course, we're collectively working together to create open source code, to build communities. But at the end of the day, all of that work has to amount to something. And impact is what it's all about. 20 years ago, I would have never in my wildest dreams when I started this job, thought that Linux would be the most widely used and important software in the history of computing. And I bet you, if you asked Linus Torvalds in a moment of honesty 30 years ago, he probably wouldn't have said the same thing either, would have thought the same thing as well. He just never would have predicted it. But the impact has really been tremendous. It has led to incredible innovation. And that's the kind of thing I think we all want to continue to achieve in our collective work. And at the Linux Foundation, we definitely want to have impact. But before we talk about impact, we have to ask about, like, what exactly is it that we do? Recently, there's been news about open source license changes, and our foundation's a good thing. Is open source going to change? And I can give you all a sigh of relief, and that open source will continue. And that foundations will continue to do good work. I'm not worried about that. But I do think it's important to be clear about what foundations actually do. And for those of you who were here last year, I'm going to brush off my old meme. It's seen a sort of early 2000s internet meme to sort of describe what foundations actually do. So my friends all think that my job at a foundation is going around to exotic locations, giving keynotes, having a wonderful time. And they're unclear on what... Actually, my wife isn't even clear quite on what I do. But that's what people think foundations do. It's all a bunch of marketing, and it's all a bunch of stuff that they're all having fun at these events. My mom thinks that I'm practically saving the world and that it's so lovely. And you know what? Even if I had a totally different job, she'd probably think the same thing. I think society still thinks that we're all a bunch of hippies giving stuff away for free. Nothing wrong with that, right? Especially we're here in California. Developers obviously think this is what we're doing. Tonight I'll be out in the lobby area with my cigars and $100 bills for those of you who care to join me. This is what I think I do. I think I'm just so great at what I do. I have to have some kind of solace in that. But this is what foundations actually do. I love to use this metaphor. We're kind of the janitors of open source. And of course, when you're the janitor, the bathroom is never quite clean enough. What open source organizations like the Linux Foundation do, or the Eclipse Foundation, or the Apache Software Foundation, is all the stuff around the core of open source, which is to create these wonderful projects, these amazing innovation engines, that have to be done. Somebody has to organize the event. I think we have a pretty darn good event team. Somebody has to promote the project in order to get more developers to come. Somebody has to manage the IT infrastructure and all of the build environments and so forth. Someone has to underwrite all of that. Someone has to help teach people about how to use the technology and come into the community quickly. Someone has to manage all of the intellectual property that's used to govern these, you know, intellectual properties at the core of what we do, whether it's a trademark or a copyright or patent, somebody has to do this work. To be clear, this is what we do. What we don't do is we're not actually an engineering organization. The Linux Foundation only employs a very small number of engineers. One of them is very cantankerous and doesn't listen to anything that I say. I think we know who I'm talking about. But what the Linux Foundation is trying to achieve is to be a new type of innovation organization, whether it's doing all the hard work that isn't code related for an open source project or to develop a global standard or to create data sharing so that artificial intelligence models can consume large data sets in a many, many sharing way. This is the stuff we do. The Linux Foundation is only home to about a thousand core open source projects. When I tell people that they're like, aren't there like eight gajillion open source projects on GitHub? They're like a thousand. That seems really, really small, right? And it's counter-intuitive. But what a foundation really does is focuses in on those really important projects that we all collectively depend on. And there actually is a very long tail of open source, and that's not to diminish the long tail. But it's to point out that projects like Linux and Kubernetes and PyTorch are just critical to all of us collectively. And that's why even more you need to create sustainable ecosystems around these because people will indeed, and they have for 30 years, count on this technology for a long period of time. And this is what those ecosystems look like. I want to remind us. You've many of you have seen this, and I want to keep reminding you. The Linux Foundation is almost like a weird reverse VC where instead of looking for product market fit, we're looking for project market fit. You know, is this open source project something that can move the needle on society? Is it good technology that can be used widely for really important tasks? Can we get the financial underwriting? Can we get industry and society to collectively invest? If so, that project becomes products like cloud service or internet search or a mobile handset or an embedded system, and those products create value in the form of profits or in benefit to society. And when companies and society create value, they then reinvest, largely not in the form of investing in the Linux Foundation, although I do ask all the time for people to invest in the LF, the big investment in open source is in engineers who find bugs, who improve code, and they contribute back to that project, which begets better products, which begets more value, which begets better projects, and that's the virtuous cycle. And here's all the stuff that you have to do to support that virtuous cycle of development. It's just a large set of things at the Linux Foundation. We have over 350 employees who support all of these endeavors across these critical projects. And one of the fun ways I like to think about the Linux Foundation in terms of facilitating this work is just how leveraged of an engineering effort it actually is. This is one of my favorite statistics, $26 billion. We track the developers who work in our communities every day. It's roughly about $600,000 on any given day. And so as an experiment, I took the average pay for a developer from the highest in the United States to the lowest in parts of West Africa and came up with about $43,000 a year average software developers cost across the globe. And if you multiply that times the number of developers in our community, it would equal a $26 billion payroll. And one of our speakers today is from Microsoft, so he can correct me if I'm wrong on this, Marky. It's Microsoft's R&D payroll, I think this is from their last year's annual report, was $24 billion, Microsoft being the largest software company in the world. But if you compare it to the Linux Foundation, we're giving Microsoft a pretty good run for the money in this leveraged outcome. And so that's what we do at the Linux Foundation. And we have all kinds of metrics for the what. Are we growing developers? Are we adding members? You know, last year we trained cumulatively three million people. Our event team ran 256 events with 120,000 attendees across over 70 countries, just an amazing amount of work. But the most important thing is why, why we do it. And that comes back to this impact. And today it's incredibly hard for even me, who's been at it for 20 years, to even keep track of all the tremendous impact that happens across all of these amazing open source projects. And so I'm going to show you some busy slides and I'm just going to pick a few things out of each to give you a sample of the kind of impact we're seeing at the Linux Foundation this year. And what I would encourage you to do is pick your favorite open source project or open source organization and go find these same stories about impact and share them with other people and remind folks how important the work that we all do is every day. In edge computing, in open source advocacy, in cloud computing, this year the Linux Foundation did a ton of work. Our team, collaboratively with many of you in the room, worked to provide a clear voice on the Cyber Resilience Act in the EU that can have a real negative impact on open source to help work with regulators there to correct well-intentioned but misguided legislation related to open source. Our organization worked collectively with our community to defend against USPTO rule changes that could negatively impact open source collaboration. Like I said, we passed over three million course enrollments training a new generation of developers on open source technology. And while Linux may not be the shiny new thing, it just continues to be the underlying platform for the majority of the world's computing systems. In other areas, AT&T is now seeing a 40% reduction in their operating expenses through virtualizing their infrastructure and using our open network automation platform to orchestrate this software-defined modern network. 40% reduction in OPEX. That is incredible. Kubernetes, it goes without saying, has fundamentally changed the way new cloud applications are built and maintained. New projects like CNCF's backstage are resulting in massive statements. I love that we talked to the folks at Toyota. Toyota, just one company, is saving five million dollars a year due to a better developer experience using backstage. I mean, that is real impact. Think about that, what you could do with that extra five million dollars. In AI, one of our projects, PyTorch, is the backbone of machine learning and large language models. We're home to numerous critical standards and software building blocks that are used to create those large language models, those foundation models like chat GPT that we're all hearing about, that will fundamentally change our lives. But those all require these open-source building blocks, these open data sets to actually make that happen. The open-source security foundation has, after years, finally gotten real momentum with Sigstore to have actual package signing in the distribution of the software we all collectively depend on. And from hearing aids to Chromebooks to small embedded systems, the Zephyr real-time operating system project is not only helping to create these devices but is extending battery life in meaningful ways so that that hearing aid can last longer, so that that wearable that can monitor your health can last longer. These are all incredible impacts and I could go on and on and on. In our energy project, a Dutch grid operator is performing power system analysis 10 to 100 times faster with our open-source project, the power grid model. RTE, the French power distribution company, the national energy provider for France, is using an open-source project's pledge power to reduce their operating costs by 50%. This is real impact. This is why we work on the stuff we do. It is directly helping with climate change. It is amazing to see this kind of work. And it's not just market impact or society impact. One of my favorite things that foundations do is provide personal impact. You know, I love going to our events and talking to developers and talking to folks, young folks who come to our events. And you know, we've offered every year millions of dollars in funding to help people from underrepresented community to help people who have lesser means to come to our events. And that has resulted in real job opportunities that are life-changing. I'll never forget being in Shanghai one time and I was at a meeting. There was a woman in the back room of the back end of the meeting and ran up to me afterwards and said, Jim, Jim, Jim, Jim. I just, I gotta talk to you. I kind of freaked out a little bit. I'm like, what is this? She's like, I just, you changed my life. This woman was from Brazil, had gotten a travel sponsorship to come to one of our events, met up with Tencent at the event, got a job, moved to Shanghai. Completely life-altering experience. And we have those over and over and over again at the foundation. It's one of the funnest things about the job and you are all a part of that story. I mean, think about how many of you have talked to folks who've been at our events and helped them get their career started and change their lives in a meaningful way. I mean, that's just, that's why we work. We're working on diversity and having real impact there as well. $1.6 million in community travel funding just this year alone to bring people from diverse communities into our world. And we get to hear about the impact of that and we know that diverse communities are better communities. The Linux Foundation itself walks our talk. We have a diverse staff at every layer of the organization from the board to the management team to our staff in and of itself. And I think these are the important things. That's why we do what we do. And the last thing though that I want to leave you all with is a little Linux Foundation insider secret, which is how we do it. This is the thing I don't actually talk all that much about, right? People like, oh, we know the Linux Foundation and they're doing all this stuff and they're having this impact. They're like, what's the secret? Why has the Linux Foundation or other organizations like ours been able to do that? How? And I want to share the internal. This is for new employees at the Linux Foundation. All Linux Foundation employees will recognize this from our collective all hands. The culture and training how to for working at the Linux Foundation and for working in open source communities. I think this doesn't just apply to being an employee at the Linux Foundation. I think this applies to definitely working with open source communities and large collective efforts probably also applies to life, right? They're not that complex, but I think they're sage and wise. And the catch phrase we use is helpful, hopeful, and humble. You know, at the core, to be the janitor, you have to want to help people. You have to want to make things better, right? To do all that work to enable the brilliant minds who create this code, you have to have a true desire in the best in people. Anticipate the needs of others, of developers, of sponsors who are contributing to these communities. You have to be able to, you know, add value by listening, validating, solving problems in the context of community members' needs. That may be different than the answer you're thinking immediately, but if you're striving to be helpful, you'll always get better outcomes doing all of that hard work that needs to be done to support these collective efforts. And of course, you cannot also do this effort without being optimistic and hopeful. You know, optimism is a sign of strength and stability. And if I had a nickel for every criticism of the Linux Foundation or myself personally, it's just, it happens in, you know, the open-source way, you know, has, is generally a way of critique, right? We work on source code together, we critique that source code, sometimes that criticism can get pretty loud and pretty tough to deal with. And if you're hopeful, that even though things are hard, that you know that there will be a better outcome in the end, that our best days are ahead of us, then you can iteratively solve problems with developers and community members in consideration of better outcomes. But I think the last H in the helpful, hopeful, humble culture at the Linux Foundation is the most important one, which is just humility, right? Humility didn't know that you don't have to be the center of attention to help have big impact. That you don't need to be the person who gets all the credit, that you succeed when your community succeeds. Remember, the Linux Foundation, our job, the job of most foundation, depends on people getting work done, none of whom work for us. If we can't lead through influence, we can't do our jobs and leading through influence, leading from behind is an act of humility, of constant learning, of recognizing there are just different ways to solve problems than you might know about. And if you have that sense of humility, that's what really is the secret to the success of working at an organization like the Linux Foundation or in the open source community in general. And that's really where I want to leave you all with, is how important those things are. I think we all understand what we do. It's important we're not a developer organization, we do all that hard stuff around it in order to make that brilliant development work happen. We do it in order to have impact. And the way we do it is by being helpful, being optimistic, but most importantly, humble. And you know recently I have been humble. You know, for those of you, you can't quite see it, but I recently faced a health challenge myself. I've got a big scar on my face. I was diagnosed with a form of skin cancer that needed immediate care. Thankfully, I underwent surgery and just to get ahead of all the questions when you see me up close and you see the scar, it was a complete success, no cancer. But it has been a reminder of the fragility of life, especially in these incredibly unpredictable times. You know, I can't help but think of individuals around the world who face their own battles. Many without the health care resources that I've been fortunate enough to access or a circle of support around them. You know, their struggles are so much in contrast to my own situation. I'm just grateful for the medical attention I've received friends and family who've helped me in all of your support. It's a privilege I don't take lightly, and it's just reinforced humility and my empathy and desire to support other people. You know, I'm reminded of the words of Helen Keller. Alone we can do so little. Together we can do so much. Whether it's facing a challenge in health, personal turmoil or the immediate pain affecting us in the world and in the Middle East today, it is our togetherness, our collective empathy, our unwavering support to one another that help us navigate through these difficult times. At the Linux Foundation, we are so lucky, and I speak on behalf of our entire staff, that that is essentially our full-time job. We get to build communities. We get to bring people together. And so that's what I want to leave you all with today. Let's be there for each other in all the ways that matter. Thank you. And with that, I am going to introduce our first talk of the day. This talk is incredibly timely. It's on the topic that everybody wants to talk about these, AI and the future of AI in open-source developer. Fun fact, two of our speakers are close friends and have been married for quite some time. I think you'll recognize who they are. But I'm going to let our moderator handle the introductions for everyone. So please welcome Erica Brescia, John O'Bacon, and Bayon Lut. All right. Good to see everybody and so many familiar faces. Thanks for being here today, and thanks to both of you for making the trek down. So my name is Erica Brescia. I started life as a founder in the open-source world, then went on to be COO of GitHub. Now I've moved over to what some call the dark side. I'm a venture capitalist, but also proudly served on the Board of the Linux Foundation for about eight years. Obviously, as an investor and just open-source citizen, I think AI and the future of open-source is something very top of mind for me. So really look forward to diving into what I think will be the start of hopefully an ongoing, really interesting conversation. Before we dig into that, I'll let you to introduce yourselves. John, we all know you have outstandingly good taste in spouses. So we'll start with that. Have I met my wife yet? Most important. Yeah, tell everybody a little bit about what you do. So I, hey everyone, I'm Jono. I founded an accelerator called the Community Leadership Corp, where we help companies who've invested in building communities in DevRel to get just quarterly results with growth and engagement. We provide coaching and training on all kinds of good stuff. I'm Bian, CTO co-founder of a company called SourceGraph. We build developer tools, so we're makers of SourceGraph, the code search engine, as well as Cody, an open-source AI coding assistant that makes use of the code search engine to better write code and answer questions about your specific code base. So, you know, obviously open-source very much all about code to kick us off. What are some of the most interesting or surprising learnings you've had since releasing Cody into the wild? Yeah, so for background, Cody is our open-source AI coding assistant. It's something that we started building maybe about a year ago, actually, like it was about a month before chatGPT came out. And we built it, we kind of got it on the marketplace, and it's similar to a lot of the other AI coding assistants that people have used. We do kind of like inline auto-completion, there's also a chat-based interface, and there's a kind of like a set of commands that does things like generate tests, write documentation, and things like that. It's better though, right? Yeah, so we like to think so. We think the main point of differentiation is that it has the context of your code base. It's able to use code search and code navigation, things that we've spent the past 10 years kind of building at SourceGraph for human developers. Now we're handing that to the AI and it turns out that helps a lot with the quality of the code generation. But in terms of surprising use cases, I think it's been just like surprises around every corner. Both good and bad actually. So I'll give maybe like one good and one bad surprise. A good surprise has been just everywhere that people are finding use cases for it in the kind of like interloop of the dev cycle. So when we first started out, we thought the main use case would be for like writing code, right? Because the first kind of version of code AI that was popular was kind of this like inline auto-complete tool. We found that a lot of what people are using code for is actually not just the writing code, but actually understanding what's going on in the code. It turns out like AI very good at translating from one language to another, including from programming languages to human language. It's actually a very good explainer of hairy code or difficult technical concepts if you're new to an area of the code base. And so that's been kind of a delightful surprise. It's like, hey, this is not just good for AI generated codes. It's actually useful for human developers who want to understand what code is being written, regardless of whether it's written by a human or an AI. A bad surprise has been the initial version of code you kind of represented this thing as kind of like a, you know, a chat-like quasi-sentient persona. And as a result, a lot of people assume that you can sort of ask questions of it as you much would a human. So there are a bunch of like quote-unquote softball questions that people would ask of the coding assistant that were actually quite difficult for it to answer. So something like, hey, how many like C files are in this code base? It's something that like a human could answer very easily with a simple, you know, one line bash command. It turns out the way that this thing works were essentially just taking the user question, doing a bunch of code searches, fetching in code snippets, and then asking the language model to come up with the answer. And so it's actually very bad at counting files. So just make up stuff to answer. And a lot of people would be like, oh, like this thing is dumb. It doesn't even know how many like files, language files are in my code. And so we've had to work around how to present the kind of like the nature of this tool to people in order to kind of like level set expectations. Awesome. Yeah. I think, you know, the ability to understand the code base is on the critical path to becoming a contributor to open source projects. So I want to come back to that as we talk about what the future on ramp looks like. But before we get there, John, you know, you help a lot of folks figure out how to like build and bring people into their communities. Like what are you most excited about when it comes to AI and open source? I think there's a number of things. One of the things that keeps cropping up in the DevRel and community world is people are primarily using AI for like content creation and things like that. And there's this big debate going on around, you know, do we want content created by AI? And my view is yes, we 100% do because I don't want a DevRel person or a community manager or a maintainer spending 45 minutes creating a piece of content. I want them spending 45 minutes engaging with their audience, building relationships, building trust, fostering and mentoring those relationships. And what I love about AI is there's an opportunity to really automate or at least speed up a lot of that kind of manual work that goes into building great communities. And the other thing that I think is exciting about it as well is just tremendous insight. Like we've got a member in the community leadership core, for example, that's just dealing with massive amounts of slack data. And in next to no time with chat GBT, you can actually identify patterns in that data that you can then use to make decisions to build better engagement and scale it up. To me, we shouldn't need people who've been building communities for 20 years to help us build communities. We should be able to use these tools to give us wisdom. I don't want to see dashboards. I don't want to see numbers and graphs. It's not interesting. What I want to see is a tool saying this is the next step for you in what you're doing in your open source project. And we've never seen that before until we had AI. And so like realistically, how much are you seeing in the wild? Like people already embracing and using these tools to help fuel the growth of their communities? I'd say it's getting going. I think people are using it right now for the obvious use cases. Content being one of them. A lot of people are generating content. But what they're doing is they're using it as a tool to save them right in words down, which to me is not the right way to do it. Because as anyone who's used chat, gbt will know if you ask it to create a blog post you're going to get garbage. But when you feed it information, so for example, one of the things I do is I take transcripts and videos that I answer questions for my members. I feed it the transcript and then I ask it to create content for me. And the results are brilliant, apart from statistics. It's terrible statistics. So I'm starting to see those kinds of use cases and data analysis beyond that, not a huge amount going on. So not code writing, not doc writing, not, it's more about content and that part of engagement. No code, some docs. And I also want to see impact translations as well. Like when we were at Canonical, Ubuntu was translated into like 160 languages, all manual labor. We should be using AI for that. Yeah, great point. And beyond what are you most excited about when you think about the impact that opens AI can have? Yeah, actually, just first off, you mentioned kind of like Slack engagement as a use case. If anyone has created the kind of like AI Slack bot yet, please let me know, because I feel like that's something that I want, like answering all the questions I do in Slack day to day and also for our kind of contributor community. I think another cool use case would be having a bot in Discord or Slack that can answer commonly asked questions that people have. We get all sorts of people coming to our community either wanting to use Kodi or contribute to Kodi. And there's just like not enough humans in the room to answer all the questions. But in terms of what I'm excited about for AI's impact on developers and open source at large, I think the main thing that excites me is when I think about like my job as a programmer, I would say like somewhere between 90 and 95% of my job is actually a form of toil. You know, there's all the annoying stuff that you would rather not do. For me, it's things like, I don't know, like all the difficulty of like figuring out where something lives in the code base, right? Like if you asked me to implement a simple issue, oftentimes it's not actually that simple if you don't have the context or the awareness or the familiarity of that code. Maybe it's something like, you know, if you're a maintainer reviewing like a 100 file pull request, like someone's put all this work into building this fancy new feature, but now you are the one, you didn't get the enjoyment of actually writing the feature, you just got to review the huge PR. And so like those things of toil, I think they're perfect matches for AI. In that like anything that can be reduced to kind of like a very non-creative, almost a mechanical task of like go through and do this thing, it's similar to something that you've done or someone else has done, you know, a million times before somewhere, go and essentially do the same thing, but you know, pattern match it for this specific use case. That's a perfect AI use case. And I think that the more mature the technology becomes and the AI coding tools become, the more and more of that toil we can automate away. You mentioned bots on Slack. This does in my mind present a problem though, which is I emailed the other day a member of Angela who organizes these events, that her team who responded immediately and I honestly thought they were a bot. They're so efficient. They're already among us. Yeah, efficiency may be penalized as we move forward. The LF events team always on point. We love you all. Yeah, I mean one, you know, I think really hot topic that we'll explore throughout the summit too is maintainer burnout. I know it's something I think about a lot given how much of our world today relies on these open source projects. Like how can AI tools help? How will they help? You know, alleviate some of the pressure and stress and just burden on maintainers? Who wants to jump that first? I think this is going to be an interesting one because I think burnout is a uniquely human problem. And when we, and I'm not a psychologist by any stretch, but when we unpick it, I kind of go back to the burnout cycle, which was a study done a number of years ago. It's published in Scientific American Mind. And there's, you know, additional, like layers and layers of burnout. Nobody burns out immediately. And this is something we see all the time with maintainers and open source communities, right? Is that it's a series of paper cuts. Where I think AI will be interesting here in the short term will be identifying patterns in behavior and even potentially sentiment analysis, which has got a set of questions attached to it where we can start seeing the signals. Right now I think when when we've all experienced this in this room, when you see someone who you work with or a family member or a friend and they're burning out, you spot the signals, but it depends on us spotting the signals. And I think there's some opportunity there for AI to have some trip wise where we can see that and then we can step in as a human. But I actually think when we go into the future, but not necessarily terminate a two future, we're already seeing right now AI have this amazing ability to do role play. Like if you, this is an interesting thing you can do right now with chat GBT, is if you say to chat GBT, you are a and then you describe you describe that persona like when I was launching the community leadership core, I said you are a business coach with expertise with founders who are familiar with open source and early stage startups and asked it a series of questions and the results were profoundly interesting and we'll get to a point I think where AI will be good enough to start providing some level of guidance when it comes to burn out, but of course there's a lot of ethical questions around that as well. Going back to what we talked about before about like this on ramp to open source, you know, back at GitHub I used to talk about how we had this huge developer shortage and like where the next 50 million developers going to come from and how do we build the right on ramps to open source and there've been a few comments about translations for example from an accessibility standpoint, but like what do you think about beyond in terms of how can we leverage these tools to bring more folks into open source? I think it's a really good question. I think maybe a hidden part of your question is a question I think that's on a lot of people's minds which is does AI, is AI going to expand the number of developers in the world or is it going to decrease the number of developers we need because it gets much more efficient to create software? And personally I'm on the side of it's going to vastly expand the number of developers because I think that the demand for good software far outpaces the pace at which we're able to supply it right now. Like if you look at like what software is available today I would actually say you know most software that people use is not very good to be quite honest there's like tons of bugs there's usability issues and by far like the limiting constraint is the number of talented highly qualified people in the world that can speak you know programming languages effectively and so I think that the impact of AI is is in effect going to kind of like loosen that constraint and what we're going to see is both an influx of people coming in to the kind of field of software creation people who don't necessarily have you know four year computer science degrees who are now enabled to create software just by being able to kind of speak natural language to an AI interface which then does the translation of programming languages and also providing a point of leverage to people with experience senior developers who can now project their influence out into the world into their code bases much more effectively because they don't have to manually review every single pull request they can kind of use these tools to kind of be be a a leverage point really to kind of get visibility into what's going on on the code and to ensure that there's kind of coherency and consistency to the code base as a whole yeah I mean a question that I have been pondering a lot is as code gen models get more sophisticated like I hear you on that the end products right like what are people building what's the what's the end result but a question is like what happens in the middle you know if you look at the number of npm modules or pick a category of open source like do we actually need as many of the building blocks or are models going to kind of coalesce around a specific set of best practices and perhaps like shrink the building blocks and it's going to be those final outputs that become by shrink I mean shrink the number of them and it's the final outputs that are kind of harnessing human creativity to create you know different apps that we might need for a million different things like how do you think about that like in other words maybe there'll be fewer like open source libraries more like end user applications because now it's just like most new entry entrance into the field will probably want to create applications because that's the thing that's more tangible than user and maybe the the adoption of models for code will reduce the need for effective kind of abstractions exactly which is what libraries have traditionally done exactly yeah I mean modularity is a human function to understand code in small chunks and reuse right although to me this does beg the question of I think we're going to get to a point where right now without picking a specific number a developer can crank out a certain amount of code and understand a certain amount of code to your point earlier on we're going to get to a point where with AI developers are generating way way more code and I think that's going to have a visibility issue like everybody's talking about code generation but to me a big chunk of this is really understanding code and how it operates and how it works and then pairing that with you know efficiency right like I mean for the old people in the room like myself we all remember what it was like when you know you take a game like doom you fit it on a floppy disk you can't do that anymore right so if we're generating all this code there's a there's a performance issue as well performance in terms of understanding the code as well as just operating code quickly so yeah but how how long will it be that we actually need to understand all the inside bits of the code when you think about the next generation of developers coming online maybe an extreme version of your question is like would do AI coders coding models whatever you want to call them do they potentially eliminate the need for maybe like a layer of abstraction yeah right because now your AI coding tool can speak I don't know like a low level language and so we don't need the high level framework anymore to make the low level bits more accessible to a human I think that's entirely within the realm of possibilities like maybe maybe there are certain pieces of software infrastructure that are there's less demand for them now because the process of writing I guess like toilsome or code that's similar to code that already exists somewhere out there has has gotten a lot cheaper and easier with the advent of AI coding models yeah I'm not sure I'm not sure if I agree because I just think the the psychology of human beings will always be there like right now nobody cares about the machine code that GCC is spitting out right or assembler because that's a very targeted set of abstraction but you know any large organization today they're going to bring in a you know a piece of software they do a security audit and things like that and I just wonder what that's got I think technologically we'll get there but my kind of take is human beings with us we're going to be the slowest link in the chain right yeah and human anxiety insecurity fear especially as we face more cyber threats as well I suspect that that may hold us back which will be an interesting an interesting moment in technology as well I'll make one prediction which I I think that I don't know about whether this will eliminate the need for like a you know your favorite web framework or whatever layer of abstraction in whatever ecosystem is is yours of choice but I think this will enable the ecosystem to iterate more quickly on those abstractions because I think one of the things that happens now is this sort of like ossification over time like any sort of adoption of a new framework or abstraction you're essentially taking a platform dependency on that API and the switching cost is non-zero in fact right now it's it's very high to port your application from one framework or one you know system to another I think with with as AI matures in coding I think these kind of like large-scale code transformations where you're like oh I I want to migrate this thing to that thing will become a lot more tractable because the sort of tedious task of like you know translating you know angular to react or or whatnot or vice versa will become a lot easier you'll have a like a bulldozer that you can use now instead of hacking away at stuff with the you know a pick and shovel yeah the bulldozer or the future yeah exactly you're gonna hear first I think we're at time I mean we could go on for hours there's so many interesting questions I think we all have as you know AI continues to evolve so I hope you all continue to be you know part of the conversation throughout the summit and as time goes on but thank you very much thank you to both of you for coming I really appreciate the time all right our next speaker is very qualified to answer all of those previous questions as well but it's here to talk about a different topic this one I'm actually really excited about because I never thought I would see the day where a cloud vendor is open sourcing a technology that enables multi-cloud to actually work but that is what our next speaker is going to do um Marcus Sinevich the CTO of Azure is here to tell us about a new open source project Radius please welcome to the stage Marcus Sinevich so I'm really excited to share with you project Radius this morning first I want to thank Jim and the Linux Foundation for giving us this platform to share this exciting news with you the fact is though that this isn't the first time that Microsoft or Azure have contributed something to open source that works across different platforms in fact Project Radius comes from the Azure Incubations team which I lead which has several projects that are already part of CNCF and some of you might be familiar with some of these how many people have heard of Kata so that was the first project that we created about four years ago when the team was just two people but since then we've got several other projects that we've been submitted and actually hopefully on the verge of graduating as projects in CNCF one of them is Dapper how many people have heard of Dapper so that one is designed to work across all cloud platforms in fact when we first launched it we supported all three cloud platforms all the three major hyperscalers that is and then the most recent one we just submitted to CNCF was Project Copacetic which is aimed at patching container images for better security for those images but Radius actually the journey towards Radius started even back when I started in Azure in 2010 as I started to look at the evolution of cloud native computing especially the rise of Kubernetes and what we've seen is that it's become more and more difficult for developers to build applications back in the old days it was my three tier or two tier SOA application today it's microservices and today it's microservices that are complex to monitor to manage and people have to continuously update and operate them which they didn't really have to before in the old waterfall days of software development troubleshooting them is difficult because of all of the systems that are interacting to support an application and then most enterprises are having difficulty in enforcing best practices inside of those applications making sure that developers fall into pit of successes and then finally developers now have to worry about not just running on a single box but running across different environments making sure that the application works in the on-prem environment that they've got making sure that it works on the cloud provider that they've got in many enterprise customers have multiple cloud providers where they want to make sure that the tooling is consistent across them and that they can take applications or components and run them across them and this is exactly what radius was designed to do recognizing that Kubernetes has become the de facto cloud application infrastructure this is what when I go talk to enterprises about what their strategy is for cloud native computing they actually answer with one word CNCF that's almost always what I hear is my strategy is CNCF and at the core of CNCF of course is Kubernetes and they want Kubernetes because it works everywhere that they want to deploy their applications and it's got this thriving open source community behind it it works on Amazon Azure Google on-premises but the fact is that Kubernetes only solves part of their problem and so they've got to turn to a ton of other tools to finish out what it means to create a cloud native application not just to compute parts of it but then they go to helm charts to be able to configure an application in its infrastructure that helm has to transform into the other pieces of the application that aren't just running on the Kubernetes cluster like the cloud services that the application depends on manage services from the cloud providers that I mentioned open source services that they might deploy in their on-premises infrastructure all of this kind of tied together with baling wire duct tape through bash scripts and PowerShell scripts and so creating an application has become just a jury rig kind of exercise not just that but once you deploy the application this is what you're left with you see the infrastructure merged in with the application and you don't see anything about the relationship between the resources find the front end here find the back end find the cache that the front end is using and it's you just don't understand what's going on here which again is another challenge when it comes to troubleshooting this thing or understanding where the dependencies are that might cause this application to have performance issues or reliability issues so that's what Radius like I said was designed to address all of these kinds of challenges trying to simplify the job of developers and make it possible for the operators or platform engineers to make sure the developers are falling into the pit of success as defined by their own enterprise because every other enterprise has its own definition of best practices not just that but developers oftentimes aren't the ones deploying and operating an application and the application might go to different environments might go to region in the US where you've just launched the app the service that the application supporting so it requires a very small footprint but over in Europe requires a very large footprint because you've been operating there for a few years and you have lots of customers the developer doesn't have want to worry about those concerns it's the people that are operating making sure that the applications they're servicing them so radius designed from the start to enable this collaboration between a developer and a platform engineer or it ops person which in some cases if you're full stack might be the same person but being able to clearly delineate here's my application and here's the infrastructure and to support the creation of that infrastructure through something called recipes and this is where the best practices get defined and then when you have the output of this you have an application graph graph that shows you the relationship between the compute components of the application and the managed services or services that it requires to run understanding those relationships and being incented to leverage radius of support for it because it's actually going to do things for you that are very helpful that you would otherwise have to do by hand and then finally like I said every enterprise customer I talked to just about everyone worries about deploying an application you know on-premises environment and a hyperscale public cloud and like I said many of them support multiple so right off the bat we knew we have to make sure radius supports all these environments not just Azure or not just Kubernetes now just to give you another level of look at the way radius works an application developer or architect defines the structure of their application they define it in their terms they understand a gateway a front end container a back end container or microservice the front end requires a redis cache the back end requires a state store to store the state of the application in a Mongo database for example and they then or an IT ops person defines an environment basically a landing zone for the application the environment is configured through something called recipes that bind those components of the application to the infrastructure and it does this dynamically so you can take the application in this case bind it through to a local Kubernetes environment using the recipes for a local Kubernetes implementation of redis and Mongo but then you can also swap that out then that might be your dev test environment go to a production environment on Azure and the recipes for that environment bind that infrastructure to Azure Redis cache and Azure Cosmos DB implementations of those services that are native to Azure and then running of course on the managed Kubernetes service in Azure then create another environment in another cloud provider AWS and bind then through recipes to native resources or services for that cloud provider memory DB and document DB and deploy into EKS radius makes this possible with that developers not have or the IT operating operators not having to change the description of the application being able to take that as an immutable artifact and then bind it to the infrastructure now radius then for the application consists of a core set of resources those components that describe those compute based microservices the containers connections between the containers gateway and a secret store kind of fundamentals of your compute core building blocks of an application it supports a set of standard resources and so I mentioned Redis and Mongo and that example that I gave you but it supports out of the box right now several other standard resources including Dapper and with Dapper Dapper specified resources now you have true portability across different environments not just for the code but also the application definition and then it supports resources the managed resources from all the from AWS and Azure and Kubernetes off the bat we're going to be adding more for GCP and Alibaba and other cloud providers going forward we hope others will help contribute to that and then finally as far as landing zone landing environments that supports Kubernetes like I said NEC-NCF a certified distro including the managed Kubernetes implementations across these cloud cloud cloud providers now to give you a deeper look we've got some demos here the demos are from two perspectives one of them is the platform engineering perspective and I've got two Ryan's in fact here to show you that they happen to be consistently both named Ryan the first Ryan I'm going to bring up on stage here is Ryan Umstead who's going to be the platform engineering persona he's from BlackRock and he's going to show you how he's going to be creating the radius environment with recipes where the application is going to land that the other Ryan is going to deploy so Ryan all right hi everyone my name is Ryan Umstead I'm a senior engineer at BlackRock leading a platform engineering team we've been working with Mark's team to help shape the direction of radius in today's landscape of ever evolving cloud complexities it's imperative to streamline our application SDLC it's essential that our internal developers can rapidly access cloud resources while meeting the standards of a highly regulated financial industry we see radius as a way to enhance collaboration between our development and platform teams through its unique offering of radius recipes the platform empowers developers to tap into vital cloud resources like Kubernetes and storage solutions without the need to grasp the intricate details of these underlying systems for example when our developers need some kind of cloud storage resource like a Redis cache recipes ensure that the cache meets the cost operations and security requirements that BlackRock has our engagement engagement with the radius team stems from our advocacy for open source solutions within our own technology platform allotted we believe this approach holds significant potential to resonate with the cloud native community let me walk you all through a demo here is a sample my team has created as part of our experimentation with radius this recipe is a bicep is using the bicep infrastructure as code language open source by Microsoft I could have just as easily used Terraform here you can see I've defined an Azure Redis cache which can be assumed by any of our developers let me walk you through some of the specific parameters that help us make sure we're meeting our cost operation and security concerns on lines 8 through 16 you can see we're selecting a standard skew we can adapt that parameter based on our compute or budgeting requirements as another example we've set the enable non SSS port to false for encryption and transit there are a bunch of other parameters in here to ensure that that Redis cache gets deployed as we require the developer of an app won't have to understand any of these configuration choices they just need to call the recipe and the cache will be up when their app is deployed mark mentioned that one of these things recipes do is connect the application to the resource created by the recipe or bind it they should feel magical to developers since they are not only getting a resource provisioned but their app is automatically connected to it recipes are able to wire up these connections using that result object lower on the screen as you can see there's a host a port and a password parameter that gets fed into the application so the application knows which host to connect to the port and password as expected now that we've set up this recipe let's store it and register it with a Redis environment radius environment excuse me so as you can see I'm running a command to publish we're publishing it into an OCI compliant registry which means organizations can use the registries that they're familiar with today and just like that it's published next I need to register the recipe with a specific environment radius environments enable a separation of concerns between our developer and platform teams we want our developers focused on creating their applications and my team can handle the configuration of the environments these environments can include a variety of cloud resources like a cache messaging queues or any other service required by the application since BlackRock deploys its software to many regions across multiple geographic locations we can use environments to handle these regional variances for instance my Redis cache may be larger in Europe than in North America the recipe attached to our environment encapsulates that difference without developer's code or configuration changing earlier I created an environment called Aladdin test West US 2 now I'm going to register my Redis recipe with that environment and with that I'm all finished I created a recipe to deploy a Redis cache that recipe outputs the data necessary for radius to wire the cache into a developer's application I published that recipe into an OCI registry and then registered it with a Redis environment at this point the recipe is ready to use and I'm going to hand it off to Ryan Noak who is going to share how developers use recipes when they build their applications we've actually got another Ryan in the back in case one of us has a bug this is true hi everyone my name is Ryan Noak I'm a developer on the Azure Incubations team at Microsoft and I'm the creator of Redis as part of building Redis we talked to over 70 cloud customers about the challenges their developers face when managing applications our conversations highlighted the complexity that the developers face working with Kubernetes and cloud resources and especially working with them together what we heard is that it's a pain for developers to get access to cloud resources like databases it's doubly a pain to wire those up and troubleshoot access problems if I'm having some kind of problem I'd likely need to have a back and forth with someone like Ryan it could take a long time to figure out this is why we created recipes for on-demand provisioning that follows the organization's policy controls in this demo I'm going to show you how I can use Redis in my existing helm chart and then I'm going to deploy that to a local environment on my workstation and then I'm going to deploy to the cloud using the environment that Ryan just created for me this is an example of how Redis can help platform engineers and developers collaborate and use the cloud in a way that meets all of our requirements and standards I've already set up my configuration locally for a dev environment and to use the Redis environment on Azure that Ryan created for me so first let me show you the application and this is just a normal helm chart so this is a to-do application that I've already created and we're starting with an application that I've already created and containerized because Redis works with your existing code and containers now this is a to-do application and I need a database and so in this case I'm going to use a Redis cache now I think in an enterprise environment many of us would need to file a ticket or call somebody or ask for access or maybe there's a portal instead I'm going to use a recipe and have Redis create the Redis cache for me so I can do that by just adding this little snippet of YAML here so this is a CRD that we've defined inside of Redis and you can see down at the bottom that I'm asking for Redis cache and that matches up with the way that Ryan registered the recipe when I deploy this to Kubernetes Redis is going to use the recipe that's configured in that specific environment to create the Redis cache so in my cloud environment it's going to use the recipe that you just saw and in my local dev environment it's going to come with a recipe or it's going to use a recipe that comes with radius so when we set up these local dev environments as part of radius there's a bunch of recipes that are part of the open source project and they just run popular technologies on your containerized infrastructure so you don't need a cloud account to get started so I've got my Kubernetes deployment and recipe for Redis for Redis but I need to say something about how they're connected to each other so I'm going to add some annotations here and so first what I'm doing is I'm enabling radius for this deployment so radius will be aware of it and process it and then second of all I'm declaring that connection so I'm saying that the pods created by this deployment need to be able to talk to that Redis radius will use this connection information to inject settings into the application if this were a different type of resource like an S3 bucket or azure storage account radius might do things like configure networking manage identity access or IAM permissions on AWS again we want this to feel like magic to developers as much as possible also since radius knows about the connection it's going to use that information to catalog the relationship and the infrastructure as part of the application this contributes to what we call the application graph and the idea is that everyone in the organization not just the people who work on the application can have a shared picture of what's in that application and its architecture so now I'm ready to try this out in my local dev environment and so I can just deploy this like any other helm charts so you're going to see a terminal pop up here and I'm just running a normal helm install to install this and that's done I'm going to fast forward a little bit to the point where everything has been set up and again this is just my local dev environment you can see context testing this is this is the environment where I'm trying this out on my workstation and we're just querying the status of everything and if you don't understand this that's okay I think Kubernetes is a little complicated and this is a lot to fit on the screen but I want to highlight for you at the top there's the two pods there so one of those is running Redis and that was created by the recipe and then there's my web app container that I asked for down at the bottom you can see our recipe CRD and the status is ready to explore this a little bit a little bit more in detail let's go look at the application graph for this which I can get with the radius CLI so at the top you can see our container is defined and we have the connection from the web app container to the Redis database at the bottom you can see the Redis cache and we understand that there's a connection coming in from the web app and then what's going on with these resource sections is we're cataloging all of the infrastructure so in this case since we had a local dev recipe that local dev recipe is just self hosting Redis on Kubernetes and so you can see that we've cataloged the outputs of the recipe there this is a text mode version of this we're working on an actual visualization of it which I think is a little little easier to kind of get your head around let's quickly prove that this worked since this is running inside of a Kubernetes cluster I'm going to open a port forward here and then once that's open I can pop my browser we start testing this out so on this screen I'm going to show you just quickly you can see the settings that were injected by Redis so we got the URL hostname port password all the things that the application code is going to need to be able to communicate with that and then on this page I'm just going to quickly prove this works by testing it out so you can see that's completed and we've been able to work with the app so now before I go to the cloud just one more thing Mark mentioned you know all the enterprises we talked to are multi-cloud and during our customer conversations this came up a lot a lot of enterprises today are going through this this sort of platform engineering transition and we found that when we talked to them that they all see platform engineering as kind of a multi-cloud endeavor that's just the reality is platform teams in large enterprises need to support all the places where they need to run code and we know that Kubernetes is ubiquitous it's everywhere as we said it's kind of become the common app runtime for the industry and in part Kubernetes success owes to the fact that it works the same everywhere works the same on-prem as it does in every cloud and so it's a great leveler and it'd be awesome if more of the tools that we used as developers work the same for every cloud as an open source project radius is embracing the multi-cloud reality that we live in so even though we started this project at Microsoft we built AWS support into radius and we're going to work with the community to continue to enhance it over the coming months we'd love for more of you to join us and help build support for more things and more clouds we also know that many organizations have a deep investment in Terraform and we want to make sure that they can continue to leverage that investment with radius so here's an example of a Terraform recipe for AWS and what I'm doing here is I just I pulled something from the public module gallery that's that source reference there by the cursor and then I customized some things that I wanted to customize and I kind of wrapped this up into a recipe so this just serves as an example of how you can use your existing Terraform investment with radius or you can use existing open source Terraform modules if you want by the way we saw the bicep infrastructure as code language earlier which is an open source project from Microsoft we've also built bicep support for AWS and we're going to continue to invest in that as part of radius our philosophy is that we're unopinionated about the kind of tools and infrastructure as code technologies you want to use and we're going to empower the cloud native community to build whatever kinds of integrations that they think are valuable so just imagine in the background I've gone through the same steps that Ryan has I created a EKS cluster on AWS I configured an AWS environment and I wired up this recipe and then remember that I've also got that Azure environment that Ryan set up for me earlier and so it in this next step I'm going to deploy to both clouds so here in my terminal you can see I've got a Latin test West US 2 on top and I've got you know my production AWS environment on the bottom and again it's just a helm deploy as Mark mentioned earlier I didn't have to change any of my application code or any of my helm charts to be able to deploy to multiple clouds I have different recipes in those environments and radius is going to swap the infrastructure for me so just to prove this works I'm not going to bring up the app again but let's look at the application graph output for both of these clouds and we'll see a little bit of how it's different and I realize there's a lot of text to fit on the screen so maybe a little bit hard to read the thing I'd ask you to focus on is the Redis cache there and you're going to see you know different results so in our local test environment we're just running in Kubernetes but here you can see we've got that Microsoft cache Redis so this is again using the recipe that Ryan wrote for me and provisioning that Microsoft cache in a way that that BlackRock wants it to be provisioned on the bottom we executed that Terraform recipe and we got an AWS memory DB cluster which is one of their hosted services for Redis so to wrap up Ryan Amstead wrote a recipe for Azure and configured an environment for me for platform engineers like Ryan Redis enables them to provide a self-serve provisioning experience to their development teams and it's going to create all the cloud resources in a way that enforces their standards and best practices for me the developer Redis has given me a simplified interface to the cloud and works with tools like Helm that I'm already using and then when I can be confident that the cloud resources I need will be created in the right way without becoming an expert myself I can also be confident that my dependencies would be wired into the application in a way that makes me more productive lastly Redis is going to automatically catalog the infrastructure relationships and architecture and give the whole team a shared picture of the application so we're going to bring Mark and Ryan back up here and wrap up so you've seen I think demonstration of how Redis is setting out to solve those problems that I discussed at the beginning by separating concerns between the platform engineers and the developers by supporting multiple clouds by giving a graphical view a graph based view of an application architecture and also supporting many different environments the on-premises Kubernetes as well as multiple clouds and like you probably imagine Redis is following in the same footsteps as the other Azure Incubations projects and I'm excited to announce that yesterday we submitted Redis to the CNCF which is why we're here we're here obviously we want everybody to join us and to flesh this out we don't you know speaking about Jim we're humble about what we're doing we don't think or believe that we have all the answers here we don't think we've got it all figured out and there's a tremendous amount of work to do to really make this to really make this whoops to really make this something nothing's working all right there we go to really to really make this address everybody's needs everybody's going to have slightly different requirements and different preferences about how Redis works so please join us there you can see the links to the landing site for Redis where you'll find the documentation tutorials and source code to the to-do application you saw and there's the GitHub repo where you can hopefully come and join us and contribute so thank you very much and thanks again to the Linux Foundation and Jim for giving us this opportunity to share this with you thanks Ryan how many people are going to be at KubeCon in a couple of weeks all right so for those of you who are coming which looks like most of you and for those of you who are going to be seeing this online I'm sure you can learn a lot more about Radius at KubeCon I suspect that Microsoft will have a lot of information about Radius in their booth at KubeCon so definitely go check it out my only critique to the two Ryan's and Mark is that you missed a wonderful opportunity to name this Radius just saying so let me introduce our next speaker Brian Shea you know we've been hearing a lot of talk this year about artificial intelligence and Brian is going to talk to us today about keeping open open when it comes to AI please welcome Brian Shea the chief strategy officer from Huawei morning everyone it's great to be back here and be in this room with all of you here so there's been a lot of discussions and a lot of you could say controversies over the last few months just in terms of what is really open and we've seen a lot of changes over the last year or so just in terms of how do we start to think about open source technologies and the communities that we're engaged in we've seen things that were previously available under different open source practices that are no longer available for us to download and to use we've seen license changes from different products and technologies that we depend upon and so whether they continue to remain open is a big concern we have a lot of new technologies in the AI space that claim to be open but when you take a look at the licensing behind them they aren't open at all and we also even have a lot of different regulations around the world all looking to regulate for very good reasons around cyber security but that start to place restrictions on how open source software can be developed so I wanted to take a few minutes this morning just to talk with you all today in terms of when we think about what it means to be open what are the things that really really count and how do we work together to make sure that we can protect this community that we all work together in because if we don't ensure that open really remains open it's going to create a lot of problems for all of us here both in this room but around the world so let me start just with a little bit now when we say no it's going to be open first of all it has to be open source and by open source we mean under an OSI approved open source license full stop so this is really fundamental because we've seen very recently especially with a lot of the AI technologies and all sorts of things and arguments around well we need to keep it more open than it used to be but if it's not fully open source it creates a lot of problems in terms of how do we use the technologies so for example earlier this year we released a new open source project called Quasar this is focused on how do we start to build different sandbox container runtimes so if you're using an OCI container if you're using WebAssembly or others how do we start to build these heterogeneous infrastructures for cloud native workloads and this kind of technology if it's not fully open source you can't depend upon it you can't build your applications on it you can't build your business upon it so no we release this under the Apache license but in a sign of the times we actually had to put on the website no this project is free for personal or commercial use absolutely no restrictions at all this isn't something that we necessarily had to think about doing in the past but it's very important that when we take a look at how do we build open technologies that we still build on open source licenses another key aspect about being open is that we put things under open governance so for example another project that we donated here to CNCF is our kubech project and we've had a lot of success with our deployments you've seen maybe at some of the kubecons where we have this out in satellites and space or in automobiles around the world and so on but one of the important things for us is that within the CNCF there's an IP policy as part of the charter nobody likes to read these documents but it's very fundamental to what we do because as part of the IP charter it says that you know code must be open source it cannot be re-licensed and it cannot be withheld and when you think about just some of the issues that we've been talking about today whether about re-licensing or withholding different code or different artifacts around different projects this is really important because when we put things into open governance we share the responsibility and we share the access and no company can unilaterally make decisions around how we keep things available for access it's very important that we have open governance around our projects as well third aspect of being open now there's a lot of companies in the open source community that really want to have their cake and to eat it too so they want to be able to use open source and build end users and build technologies and gain attraction towards it but they want to keep all the business benefits for themselves they don't want others to be able to participate and benefit from the open source business opportunities that come from that set of technologies we don't think that's right we think that when you start to work together in open source the opportunity is to build a brand new much bigger business opportunity that we all share together around and we all build ecosystems around so for example earlier this year we also open sourced a new technology called Quasar sorry I mean called Expanse and what Expanse is focused on is building a set of capabilities so that across a different set of public cloud providers you can build portable managed services that go from cloud to cloud to cloud one of the things that a third party company called Decision in France did earlier this year was they conducted a research study just to say what would be the impact of the European cloud market if we were to have this project and as it continues to grow and one of the things that they found was that this project would be able to support the growth of the European cloud market from its current 14% and accelerate it to 279% huge huge increase in terms of business opportunity for everyone to participate in and so when we take a look at how do we open source technologies it's very important that we all have equal opportunity to grow in a growing ecosystem together another huge benefit and a key aspect of being open is that we have open participation for everybody involved whether from big or from small I want to start from a big standpoint first off so for those of you who were at the open source summit in Europe with Linux Foundation you probably saw a lot of news and headlines around the cyber resilience act the CRA but this is something that is proposed European regulation which will have big impact around the world in terms of how do we start to develop open source software this was designed with very good intentions how do we protect our software supply chains and make sure that they're usable and trustworthy very very noble goal but intentionally or unintentionally it also puts regulations in terms of open source and open source foundations whether they're located in Europe in the US in China or anywhere else in the world in terms of how the open source can be developed and how it can be made available and so one of the things that we did as a company was we just brought to attention that you know this is something that needs to be changed in order to support open source organizations and their foundations if we want them to be able to continue to provide the valuable service that they do I would encourage you if your organizations have not taken a look at this you definitely need to pay attention because it will have a big impact both in terms of your products and technologies but also in the communities that we work together in and then truly open participation I just want to end on a little bit of a personal note in terms of how I participate in the open source communities today starting on the left this is about when I first started getting involved in open source still as a university student and back then there was a brand new operating system out called Linux and I was in the process of learning how to install my own PC in my dorm room and I was asking all these questions just in terms of what to do this was an old email I found where I was asking how do I find all the ways to get root access to a Linux server I want to explain what kind of trouble I was getting into in terms of why I wanted root access but the thing for me that I really loved was I could go into the community and ask for help and so many people were willing to help me they would answer my questions they would support me I became a developer and so on but this was really my first exposure into open source and the community that we can all participate in and if we fast forward to just where I am today I'm in a little bit of unusual situation I'm born and grew up in the U.S. now a few years ago I moved to Hong Kong to work for Huawei and since in China I spend about five six months a year in Europe and when I take a look at all that Jim was showing up and his slides earlier just in terms of the memes in terms of what people think he does I spend a lot of time on the road and a lot of people often say I know that envious in terms of the opportunities I have but for me it's very important because in today's big global environment there's a lot of challenges to how do we start to work together from an overall global standpoint in open source I have a lot of stories just in terms of whether I'm in the U.S. whether I'm in China or whether I'm in Europe just in terms of some of the complications in terms of how do communities come together and one of the things that I have the privilege to do is just to build bridges across all these different communities and I think open source is a wonderful way to do that because we're all focused on how do we come together how do we collaborate together and how do we build something for an overall global good and for me this is a very important thing it's very challenging my hair used to be black three years ago and I've seen so many things I never imagined I would have seen but it's very important as we come together from an open source standpoint that whether you're looking at big global communities or just the individual participation of people such as myself how do we start to make sure that open source is available and open for everyone to use thank you very much but I'm excited about this next topic you know we're going to dive back into the conversation around cybersecurity and our next speaker is the co-founder and chief executive officer of horizon three AI this is a cybersecurity firm pioneering the use of AI in autonomous pen testing and if that doesn't scare you enough he prior to this role he was the CTO for the joint special operations command in the United States and today he's going to come and talk to us about building a next generation security ecosystem please welcome Snehal Antani all right thanks thanks everyone for the time today so my name is Sal Antani my background software engineer by trade start my career at IBM and then was a CIO AG capital and then CTO at Splunk and then took a break from industry industry to serve within the US special operations community hardest most meaningful work of my career and for those that I've caught up with dinner I've shared some of those stories which have been fun and then I left to start horizon three as the founder CEO where we pioneered the use of AI enabled pen testing and I think it's a good follow on to the previous two sessions that we had here so what I want to talk about today is first this realization that as defenders most of us knew very little or know very little about the actual details of attacks and to become better defenders we have to invest that time in understanding how offense can be used to inform defense the second thing is we pioneered the use of AI enabled pen testing and I'll tell some stories around how the barriers of entry to conduct a high-end cyber attack have been dramatically reduced and so as you heard from the previous speaker socks are already overwhelmed will AI enables more robust attacks in larger environments faster and so the situation is only going to get worse and the third is that we only win through community and I believe the security ecosystem and community is fundamentally broken today comprised mostly of Twitter celebrities trying to sell their security courses and that the Linux foundation and organization I've admired for years is uniquely postured to be the gravity and catalyst to bring that community together and do something significant so what I talk about is really my experience as a on the on the buyer the user side of cybersecurity as a CIO and the challenge we always had in the seat was are we secure and the answer is I have no idea I've got to wait for the bad guys to show up and before that am I fixing the right vulnerabilities am I logging the right data are my security tools actually tuned correctly and I literally had to sit around and wait for a breach to find out and that isn't very sustainable and I try to go down the penetration testing route as the only viable way to identify where I was exploitable but that in itself was an absolute horrid experience in that first I had to go off and justify the budget the second is we would spend weeks preparing our IT organizations for that pentest to show up and then they'd show up and absolutely rip us apart I don't know if it's too soon for this meme but you know we're working in there and then this year's results looked exactly like last year's results and this is incredibly frustrating for me as a CIO and then a CTO within the Department of Defense because I needed to verify my security posture my commander used to say don't tell me we're secure show me and then show me again tomorrow and then show me again next week because our environment is constantly changing and the enemy always has a vote and is evolving and so what I went through as a security journey said both in the financial services sector and in the government was first assuming breach there are way too many doors and windows that allow the bad guys to gain initial access and what was more important was not my perimeter security but how quickly I could isolate the blast radius upon initial compromise and that was the shift from talking about being secure which is a point in time state to being defensible the ability to rapidly adapt to minimize the blast radius and stifle the attack and the progression of attack the second thing I realized is that attackers don't hack in using zero days that you see in the movies often they're logging in with credentials that they've harvested and credentials became the primary attack surface I had to get after and there are no CVEs that represent credentials there's no open source software bug that represents credentials yet this is the single biggest attack surface in most organizations and the third is that I should be testing as often or more often than I am changing in my environment and this is a big mental shift towards the ability of testing as frequently as you possibly can so for every patch Tuesday you need to have a pen test Wednesday what we ended up evolving into was this fine fix verify cycle where I wanted to continuously identify my exploitable attack surface I wanted to understand how that attack surface evolved over time I wanted to rapidly fix and remediate issues that truly mattered and I wanted to verify that my security tools were actually effective and the faster this cycle the better defensible I am within my organization so I'm going to talk about in the first section here is talk through four major types of attacks patterns that we've seen across our customer base so for context we've run 28,000 pen tests in the last 18 months and that is more than the top 20 consulting firms combined throughout their entire history and so what we've got is a lot of visibility into the common vectors for compromise and when you think about a compromise attackers are very routine and well-defined in what they're trying to achieve there are a well-known set of waypoints or technical objectives that an attacker gets after the first is path to becoming a domain administrator because if they become domain admin they've got keys to the kingdom okay if they can't get the DA can they at least compromise a host and then borrow an attack at a later point or from that host compromise start to harvest credentials and then snowball into a bigger problem if they can compromise a domain user they have access to all of the data systems and services of that domain user and then the initial reaction is well what if I have multifactor authentication turned on well that TV doesn't support MFA that printer back there doesn't support MFA MFA can't be applied to lower level protocols in the organization so you have to really understand what is your attack surface and what parts of your defense in depth are actually effective and not the next part is from a testing standpoint once you've got the technical objective sorted you want to start to understand very specific operational scenarios is your network segmentation actually isolating the blast radius of an attacker is it properly stifling lateral movement if you start down a zero trust project the first day of that zero trust project an attacker has complete network reachability because it's a flat network over time though you should see that reachability change and shrink and condense and you don't know that until the bad guys have showed up so how do you verify segmentation or the blast radius of a compromise credential or that your tools are actually working and then finally how do you talk about your security posture in the language of the business because they don't care about whether you had a container security product in place what they want to understand is are you effective in detecting and stifling attacks can their uptime be compromised and affect cash flow can information be compromised and lead to legal risk and so on and so forth in that every time you update an application on board a new employee or patch a server your attack surface has changed and so every time you do any sort of change in the environment you've got to go in and verify that you're no longer exploitable so let's share four stories real quick the first one is and these are all real stories of attacks this is a large bank 5,000 hosts so it's just a small segment of their environment and this company had the latest EDR and UBA tools in place and they initiate a soon breach and initiate a pen test on a single host and like an attacker there's a famous Microsoft quote defenders thinking lists attackers thinking graphs the first thing an attacker is going to do is build out a knowledge graph that represents every host, port, service credential and so on within the environment and within this particular customer they were using a well-known EDR to protect their hosts yet we were still able still able to get code execution on a Windows box successfully dump credentials and then reuse those credentials to become domain administrator and so the big question from that CIO was what on earth happened and it turned out that the EDR was misconfigured on three out of the 5,000 machines it was just bad automation and the customer had no idea and the key point is you can't trust that your security tools are working you've got to verify that they're working properly and this is occurred with every major EDR that's out there whether it's crowdshrite, trend micro or AD products as well with Trendmarco, Fortinet and so on oftentimes these EDRs are misconfigured because say in trend micro there's an advanced checkbox that says prevent OS credential dumping most people don't know what that checkbox does and they don't realize that it has to be enabled to prevent critical types of credential harvesting from occurring and so these tools are super complicated so just because you bought it and installed it doesn't mean it's effective the other part here is the customer asked well why didn't the credential pivot get detected and stopped by the EDR tool because the marketing brochure says so and it turned out the customer purchased the wrong module within that particular vendor and that's not a conversation I'm happy I wasn't in that particular room when they talked about you know what they bought what they didn't bought because they got charged for something and the final part here is you've got to verify and you're going to hear that over and over again in my talk the second story is that attackers don't hack in they log in so one of the very first steps any attacker does upon gaining initial access is they start listening for NTLM hashes being passed along the wire so you can turn on a tool called responder and there's other variations of that tool and they'll grab NTLM hashes being passed around through multicast configurations for DNS and when you grab enough those NTLM hashes you've got enough to start cracking those hashes and as you crack those hashes you get clear text passwords that match up with the user IDs you've collected and in this example you can now use that to say take over office 365 email now the first question I always get is yeah but I've turned on multi-factor authentication for 365 and what people don't realize is you might turn that on at the group or company level but when a new employee joins the company they still have to explicitly set up MFA which is why attackers wait to see job changes and linked in and then they'll target those new employees within the first one to three days of them joining your company because they know they likely haven't set up MFA yet so what we found consistently and shockingly was that 80% of the NTLM hashes we would collect in a pen test were cracked in 15 minutes or less and we're not using some crazy quantum infrastructure these were standard GPU rigs that were able to crack those in fact a lot of those passwords were cracked near-instantaneously for a variety of reasons 10% of the service accounts we found had the user ID and password as the same value web sphere admin web sphere admin VMware admin VMware admin so on and so forth and then we also found a lot of regional use of passwords for instance if you're from New England it's Tom Brady's The Goat and if you're from Atlanta it's 28-3 WTF I used that in Atlanta a few weeks ago and I had to run offstage but the point is that attackers don't hack in they log in and this is the primary attack vector across the board the third story is a really interesting one of compromising a hybrid cloud environment to take over the production AWS infrastructure and in this example initial access on a single machine and how many people here use HP ILO or know what HP ILO is right it's a virtual appliance to manage storage devices HP ILO is a very difficult VM to monitor an instrument because it's a custom OS and people don't patch it they don't pay much attention to it but attackers know this the same thing with Dell iDRAX and other kinds of vendor specific virtual appliances so after conducting recon and organizing data into a knowledge graph we found an HP ILO box that was network reachable and we were able to successfully get code execution on that box and once again not a single alert got triggered because most people aren't these types of components in their network well ILO stores all of its credentials in clear text and memory so once you get a code execution on that box we're able to dump those credentials and compromise the domain user and start to access the file shares that that domain user had access to and from there pill for the file share we found the key store file that allowed us to log into their production AWS account this entire attack looked that looked less than two hours not a single security alert was triggered despite every Gucci tool being owned by this particular customer and that's because they weren't observing the ILO infrastructure and then from there everything else was a valid login and nothing is going to get tripped and what you'll see here is while there was a CVE used for the HP ILO box the rest of it was just standard maneuver and it looked like regular traffic and the final story of four that I'll bring together is how all these pieces are stitched together to be a complete attack so this is essentially what you saw go down at MGM, at Caesars and most organizations and the first thing attackers are going to do is conduct some sort of open source intelligence from the outside so if you want to go compromise systems on a United airplane you're going to go to LinkedIn sales navigator and search the word pilot search for the company united and you're going to find there are 7,000 pilots in there and before I say that I mean who here uses multi-factor authentication raise your hands all right great and be honest on the next question how many people reuse passwords across machines and systems raise your hands all right and how many people be honest use personal information in their passwords raise your hands you all just failed your security training and awareness all it takes is one person to reuse their compromise Netflix credentials as part of their corporate email just one so if I've got 7,000 pilots one of them is going to reuse credentials in a breach database that I'm going to be able to find as to log into their corporate email and their corporate email is first initial last name at united.com so it's through basic open source intelligence I've now got 7,000 potential user ID password pairs from there I'm going to password spray to compromise a domain user and then in most situations the domain user is also the local admin on their laptop which means they can install applications they can configure yourself and that's very typical in most large companies well if I'm the local admin I'm going to be able to dump Sam, grab those NTLM hashes of which as I said 80% are cracked in 15 minutes or less I'm going to reuse those credentials to gain access to neighboring machines and then from there I'm eventually going to find a critical credential whether it's a domain admin credential a service account credential or so on to compromise the domain and have keys to the kingdom and there is no magic AI button to prevent this from happening it's the basics it's that poor password policies were implemented or a lapse local admin privileges weren't properly secured or that active directory was too permissive and so on instead in a now 28,000 pen tests we talk about AI for defense user behavior analytics products in 28,000 pen tests never stopped us from compromising the environment because these UBA tools and these AI based defensive tools depend on pristine logging data so unless you are cloud native by design observability by design AI based SOC by design up front it's going to be very difficult to have the pristine logging data in place for the benchmarking and baselining that these AI algorithms require in order to stifle and attack another interesting thing is you can't trust your SOC's response time or your MSSP's response time it took a very well known MSSP over seven hours to detect and respond to the initiation of a pen test and their SLA was five minutes and that's because those MSSP's are also overwhelmed in how they're executing against alerts and so you can't trust them you've got to verify their effectiveness and then collaborate with them to improve their detection and response time and what's amazing here is that so we pioneered the use of AI based pen testing we had the son of one of our sales reps this nine year old kid not a technical person in four minutes and 12 seconds successfully compromise a high end bank that had every Gucci tool you could buy four minutes and 12 seconds of course the bank permitted them to do that last year it took seven minutes this year it took four minutes next year it'll take less than 60 seconds so think about it for your own security organizations in less than 60 seconds can your SOC analyst characterize the alerts get approval to take some sort of defensive action and then actually do something to stifle that attacker from becoming domain admin and the answer is no humans are quickly going to become the bottleneck because the future of cyber warfare is algorithms fighting algorithms with humans by exception and things are going to get far worse I think than they are going to get better and this isn't fear mongering it's just extrapolation of logic here if I can compromise an environment in 60 seconds and the bad guys are partnering the same tech we need to very quickly improve the effectiveness on the defensive side in order to have a fighting chance at responding fast enough to stifle those attacks appropriately another key part here is these security tools are super complicated so as you're executing an attack what you want to understand is hey we successfully dumped credentials from v-center on this IP at this time did you detect us did you log us did you alert on us did you stop us and how do you use that to tune your security tools in the past two weeks this is a bit of an eyesore but I'll walk the data real quick in the past two weeks alone to our AI based attacks we dropped 553 implants upon gaining code execution so that means we were able to compromise a host and from there we dropped a remote access tool on that host that ran as a privileged user and then from there 60% of those implants were successful which means the EDR failed to detect and prevent the implantation and these were a mix of the top three or four EDRs out there in market so 60 success rate which means a 40% effectiveness rate of the best EDRs in the market of those not only did we successfully dump SAM and then dump LSA and then dump LSAS and then get telemetry and persistence we were able to do that in 30% of all those implants so we successfully deleted every or defeated every aspect of an EDR 30% of the time and these are the best tools in the market so why did they fail it's not because the tools are bad it's because they're super hard to configure and verify that they're actually working correctly so this brings me to two key points here at the end the first one is when you look at the primary ways to compromise an organization number 10 is CVEs you know all the CVEs that we panic about in the news that's only a small fraction of how attackers successfully compromise the bulk of them have to do with weaker default credentials misconfigurations unpatched services that people aren't paying attention to like HP ILO and Dell IDRAC credential spraying techniques and so on it's not vulnerabilities in your software supply chain it's not exploitable CVEs that are in the news it's these types of techniques and this is the fundamental problem and the problem we have is we need to rethink our community approach to cyber security because if we want to have a fighting chance against those Blitzkrieg style cyber attacks we're in less than 60 seconds we're compromised community is how we win and so what I see are two forms of community here the first is cyber range as a service when you think about like there's a new Cisco XE vulnerability that came out a couple of days ago the attackers have bootlegged versions of every Cisco product and binary and the moment a new patch is dropped they do a binary dip to see what changed from this version to the previous version and they look at the code change and they reverse engineer and exploit and then they go off and weaponize it well ethical hackers live by a different set of rules they don't have access to every version of that Cisco product so it takes some days or weeks or nefarious mechanisms in order to get those versions to properly ethically research and build that exploit and you need that exploit to verify that you fix the problem there is no home for cyber range as a service where the first step is that library of various products for the sole purpose of rapidly creating exploits to test and verify remediation that's a low-hanging fruit item the second is there's no collaborative space to do benchmarking and tuning think of the web sphere days when I was at IBM we had spec.j app server and all sorts of benchmarks for web sphere performance there is no equivalent of that for security tools to make sure they're tuned correctly and configured correctly and there's no place for common configurations where if you want to be security by design you need a place to actually figure out what the right security by design is I was speaking to the chief customer officer of a very large virtualization company recently acquired by a very stodgy company in New Jersey you can kind of figure out which company I'm talking about and that commercial our customer officer said hey if my customers can't read our documentation and secure our product it's their problem not ours and I was I mean I flipped a chair at the end of that meeting because that's the mentality for some vendors and we want to be secure by design and actually do something meaningful we need a place for researchers to come together and define those architectural patterns of workloads representative of the environments that they're in not just the cloud native world but the entire landscape of hyper cloud and then the second here area of minimal effort maximum impact is around a repository for remediations so I mean by that is just just take the fix action documentation so in this example misconfigure jmx server led to code execution which led to a rat being implanted which led to domain admin well if you want to go fix that you've got three options disable jmx whitelist of firewall or configure authentication this documentation is not housed in any repository and this is a community asset this is something that everyone in the community is going to benefit on so the very simple task of documentation for how to fix problems that is maintained by community and make sure that it's accurate and truthful and effective just doesn't exist and this is a unique opportunity for us to start to seize the seize the moment the next thing here is think of detection engineering and indicators of compromise these are basically personal git repos that are out there there is no place to bring all this together and it's a big hole in the community and then the final part is production safe source scripts once again there is no place out there that's bringing these together their individual git repos scattered and as a community and as a foundation we have a very unique opportunity to rethink what community means to us in cyber security so I know I'm a minute over but I'll leave you with one last story the Japanese armed the Ukrainians earlier this year and allegedly the Russians got pissed and they ran somewhere at a small manufacturing company in Tokyo that manufacturing company supplied all of the cup holders to Toyota Motor Company it should cause the shutdown of 28 production lines and cause almost 400 million dollars of economic damage because of just in time logistics and lean manufacturing if you pick the right bottleneck in the manufacturing process everything stops the flex here is not that the Russians ran somewhere to a cup holder company the flex is they knew where to apply the least amount of effort to cause the maximum amount of economic harm below the threshold of war and that is the not the future we're living in that is the present that we're living in and so there's never been a greater time between the acceleration of attacks through AI and the role of community and economic warfare that we're starting to see in the role of cyber there if we don't solve this now we're going to be in a world of hurt and so this is the chance I think for the foundation to get together and do something of significance and consequence in the security realm so thank you for your time I appreciate it all right you and I are going to follow up on this you know I tend to avoid black cat and RSA and just halfway through your talk I was like I'm depressed again it's just such a bum well black cat's about snoop dog and train and concerts now they don't actually talk security maybe I should revisit my philosophy here but I like how you ended on a positive note in terms of like hey there is a way we can collectively tackle that so awesome I'll follow up thanks man all right appreciate it thank you all right our our next speaker is someone who we all know very well and he represents hopefully not a extinct form of journalism but one that will long live on John Corbett as the editor of Linux Weekly News and for so many years has done an amazing job of just going deep into an important topic with you know deep technical insights you know deep uh you know communication across the community about everything they need to know to do the great work they do so let me welcome to the stage John Corbett and subscribe to Linux Weekly News thanks Jim hello everybody um yeah I'm John I am among other things the maintainer of the kernel's documentation subsystem as such I live very much in the kernel world I mean I think we will agree that the kernel is a key part of our whole open source ecosystem so when we start to see kernel maintainers saying things like megan maintainer feels like a punishment and this can't stand or maintainers are burning out this should get our attention we should start wondering what is going on here is this what can we do about this so that's what I'm here to talk about is what is going on and what we can do about it to get there I need to first talk just a little bit about what kernel maintainers do so that we can understand what the nature of the problem is once upon a time nearly days if you had code that you wanted to get into the kernel you'd package it up into an email and you send it to this guy and he would um look it over and send you something back sometimes that you may or may not want to see and eventually if you're lucky apply it to the kernel and this worked for a while but if you look at what the kernel community is doing these days you see the last year we put out about six releases six one coming this weekend probably each one being a major release incorporating something like 14 15,000 changes each one incorporating the work of about 2,000 developers for a total of about 86,000 changes and about 5,000 developers all contributing to the kernel over the course of a single year this is not something that a single person can keep up with even if that person is in the store vaults right and so we ran into scalability problems there now being kernel developers we kept on doing it that way anyway for a fair while after that but eventually we realized that we had to do things a little bit differently and so the mechanism we came up with was to delegate responsibility to a whole hierarchy of maintainers so if you are a kernel developer now and you have some code you want to get into the kernel you will send it to the maintainer who's responsible for the particular subsystem that you're working with if you have a documentation patch you'll probably send it to me if you have a SCSI driver patch to go to the SCSI maintainers if you're dealing with networking you can go to some other people that maintainer will look it over review it perhaps tell you what needs to change to make it suitable for inclusion and so on you eventually apply it to their own repository which is separate from the repository maintained by Linus Torvalds that maintainer may eventually send things upstream to another maintainer who maintains a larger portion of the kernel and so on things eventually converge still on Linus Torvalds who will take a whole batch of changes or leave it once it gets to him but much to the work of the actual selection of work to go into the kernel is not done by Linus it's done by these all these maintainers who work below him now this is kind of an abstract diagram of how things work if you look at the way patches actually flow into the mainline repository you can see how it works in the real world now I have to apologize that the fonts on the next slider are just a little bit hard to read I will zoom up but I wanted to put this up here just to say so this is the picture of the hierarchy we actually have in the real world each one of those little boxes that you can't actually see is one subsystem repository with one or more maintainers managing it and they're organized in new tree and they all converge and that box sort of on the lower right there that's Linus way on the right hand side of that little diagram now if you look more closely this is a piece that diagram this is the networking subsystem right that box on the right is the net next tree which is where most of the work flowing in regarding networking the kernel goes through the net next tree feeds probably two thousand three thousand patches into the mainline with every kernel development cycle so a lot of work flows through that tree there's a lot of boxes converging on it so in that middle column the top box is the BPF repository because all the work for the BPF subsystem still goes through the networking tree the one below that is for wireless network drivers and that has a couple of specific driver specific boxes feeding into it each one of these corresponds once again to a maintainer who is dealing with specific part of the kernel and sending patches upstream so that's how the structure works what the maintainers do to make this structure work falls into several different categories starting with setting the direction for their subsystem as a whole every maintainer has some vision of where their part of the kernel should go where the technical debt is what challenges are coming in the future and so on and works to ensure that all the code coming in through that subsystem is consistent with those directions maintainers of course have to review patches and make sure they are suitable for inclusion into the kernel it's a big part of a maintainer's job once they have accepted patches they have to collect them in repositories send them up to the main line at the right period of time big feature changes have to go during the merge windows urgent fixes go at other times and so on there's a whole process around that it may surprise you to learn that kernel developers can be opinionated people so we have some pretty strong disagreements within subsystems we have people working towards a common goal but with very different ideas of how to get there you know it's often up to the maintainer to step in and get these people to see a way to move forward that satisfies everybody's needs it can often be a big part of a maintainer's job especially in certain subsystems patches that go into the main line often are fixing problems bugs that need to then be backported into the stable kernel versions that are what we're actually running on most of our systems some maintainers participate in this process more than others but it is another big job because thousands of patches have to go be backported into the stable trees and then sent through that separate path maintainers are the interface to the subsystem for a lot of people besides the developers so they have to deal with vendors if you are a maintainer of a driver subsystem you're probably talking with the manufacturers who are working and making that sort of hardware right so you understand what's coming in terms of products you're often getting patches from those vendors and so on that's a whole set of relationships that a maintainer has to manage and similarly with users and users could be somebody whose laptop doesn't work right but by users I'm also counting distributors and counting manufacturers who are embedding Linux in their products and so on counting people running large data centers these are all users who will talk to maintainers when they have issues that they need resolved in a particular subsystem that again is a big part of a maintainer's job and then late on Friday evening if the maintainer has time perhaps they actually do some development within their subsystem which is how they all started there but often they don't have time to do that anymore so that's a fair amount to do where are the pain points in all of this and I have to start by saying that certainly some of the pain points that maintainers feel are self-inflicted starting with the problem of insufficient delegation this list that I just went through is a lot of stuff for one person to do but there are a lot of subsystems where in fact one person is doing all of this it would be help it would help a lot if they would simply delegate that work out to other people who want to help with that some subsystems are very good about that others less so it is something that we're working on we also have process issues within the kernel we can be quite contentious at times it can be very hard to get work in and we can be very bureaucratic it can be sometimes a hard place to work and it makes life harder for developers and maintainers both this too we are consistently working on we have gotten better we will continue to do so but there's a lot more to it than this starting with the fact that the demand on maintainers has been growing quite a bit over time the complexity of the kernel has grown a lot the original kernel release 37 years ago was 10,000 lines of code now we have individual subsystems with 100,000 or even millions of lines of code in them maintainers have to handle a whole lot more than they used to do we have issues like hardware vulnerabilities we have scalability problems we have all these sorts of things that are oppressing on maintainers and they have to handle a lot more complexity than they once had to do we have the issue of new languages and new tools one thing I want to call out here is the current experiment about the incorporation of the Rust programming language into the kernel development process now I am actually very much in favor of this experiment I think Rust holds out a lot of promise for a kernel in the future with a whole lot less bugs a lot less security vulnerabilities and a kernel that is perhaps more attractive to today's generation of new developers but Rust is not a simple programming language that is not all that much like the C languages is used for kernel development it's a lot to learn it takes quite a while to get good at Rust if you were a kernel maintainer and you're going to start receiving submissions written in the Rust language you have to understand the language at a very deep level to be able to review those patches to be able to maintain that code going forward to know that you can fix things in it if you have to do that and so on this is a lot to ask of maintainers to learn this new language when they are already busy and overwhelmed with the work that they are doing now so this is going to be a problem going forward and as we hopefully adopt other useful tools this will come around again and again and again where there's short term cost that brings the long term benefit but that short term cost hurts our expectations for response to regressions have grown quite a bit we've always had a rule that you can't break the kernel for users but we now have a regression tracker for example who will nag maintainers the expected response time for regressions is often measured in a few days so maintainers have to be always ready to deal with the sorts of problem I want to call out fuzzers in fuzzing techniques fuzzers are testing tools that feed random data or directed random data into a new system like the kernel see what breaks and then put out a report saying you may have a bug here you may have a vulnerability here testers are incredibly valuable tools they've helped us define hundreds if not thousands of bugs and fix them before they affect our users but fuzzers also generate an awful lot of output many many reports we're getting thousands of them and even if all of these reports were good somebody has to go through them all somebody has to understand them all figure out which ones matter and deal with them and that task again falls on the maintainer most of the time add to this the fact that a lot of these reports are not good a lot of them are duplicated there is a real incentive currently among people to generate security reports and take credit for having found this problem and so we have a lot of people running fuzzers and cranking out reports without really verifying that they've found a real problem or helping in any way to solve these problems and this is overwhelming maintainers with all this data coming in and a related issue is shenanigans with CVE numbers again there is a real incentive among security researchers or people who want to be known as security researchers to have to take credit for the assignment of CVE numbers to alleged vulnerabilities and so you're seeing CVE numbers assigned to things that are not security problems at all we're seeing CVE numbers assigned to things that have never actually appeared in a released kernel and every one of these is a problem for a maintainer who has to answer questions from users saying why hasn't this CVE been addressed or they have to try to get a CVE unassigned which is a painful process and so on this is a growing problem throughout the open source community and very much a problem for kernel maintainers in a world where we have 5,000 people contributing to the kernel every year understaffing might seem like a strange thing to complain about but still we're hearing complaints because the kernel is big and there's a lot of work there and so we have maintainers saying I don't understand why we are understaffed and we are overworked when we're working for companies bringing in hundreds of billions of dollars a year in revenue there's simply not enough people going around out there to handle all the work that we have to do and that brings stress on maintainers who have to take up the slack much the time and related to that is the problem of employer support companies like to hire kernel maintainers they don't always like to give them time to actually be kernel maintainers and so a lot of people who are working as kernel maintainers are fitting that work in on the side outside of the work that they are actually being paid to do they're not being evaluated for that work they're not being credited for it this is not really in my mind an ethical way of doing things it's not an inclusive way of doing things and it's certainly not the way to get good maintenance out there maintainers need to actually be doing network as part of their jobs but many of them currently are not this is part of something that I described as dark areas in the kernel right and beyond as well there are a lot of areas that even in a project where most of the people are paid to work there no company feels that it's problem to support so documentation of course being my pet peeve in this area we have 5000 people working on the kernel 90 percent or more of them are paid to do that work there is not one person whose job it is to write documentation for the kernel and our documentation reflects that okay our build system is a thing of astonishing complexity is maintained by one person and nobody else really wants to touch it we I hope that person stays around a lot of core kernel areas if you look at what had to be done I know Jim worked on this to get the real-time work supported in the kernel despite the fact that this work is shipped by an awful lot of vendors nobody really felt the need to support that work all right companies famously don't want to support work on their older hardware they want people working on the new hardware and buying the new hardware and so on and maintainers are another thing that is on this list the companies just it's not their problem it's not the immediate problem they're trying to solve when they work on the kernel and so they don't want to support it so for the one or two of you haven't seen this cartoon a thousand times already it's still as relevant now as it was when it first came out it's a problem throughout the community and it's a problem here there are certain things that we just aren't supporting well and maintainers are part of that so enough complaining what can we do about this well i have a few suggestions for for people here and for the companies that they work for starting with let maintainers do maintenance as part of their job evaluate them on on that give them credit for it because this is not happening and is what we really need to do if we want to have good maintainership in the kernel but support maintainers in other ways and if there's one key point the point that i would like people to take away more than anything else in this talk it's patch review if you are submitting patches to the kernel you should be reviewing patches submitted to the kernel somebody has to do that and if all you're doing is submitting code then you're putting a load on the maintainer side of the equation without doing part to help that we've done a very good job in this community of making the point that if you're using open source software you have to give back to it you have to support it what we've done less effectively is make the point that submitting code is not the sum total of that that we need more we need to support the process as a whole and patch review more than anything else if your developers are not reviewing patches they need to be it's good for their own professional development and it's what makes the process work okay we need support for subsystem level development there's often a lot of stuff that needs to be done for the infrastructure of the subsystem as a whole again that often falls on maintainers they need help working on that sort of stuff and we need better tools I have one claim that in the kernel community we have under invested in tools and development tools to help with this and it has often hurt us this is the community after all that worked without a source code management system for the first 10 years of its existence all right um and the thing being of course when we solve these problems we often change the world look what happens when we did decide to adopt the source code management system right that resulted in git and that has changed things all that this is something that has come around many times it's gotten better we've had some good support for testing systems and continuous integration and so on that we didn't used to have in the past we've gotten some very good maintainer tools that have come by way of Constantine in particular at the Linux Foundation who has done some really wonderful work and has helped to transform the maintainer role and make it work better than it did but it's just the beginning we needed a lot more support for development tools for kernel developers and beyond to to help this whole process work better and we of course cannot forget that once we have these tools they need maintainers too just to close here in part of our work in the technical advisory board the Linux Foundation we put together a document this last year that we call the Contribution Maturity Model it's a way of looking at how a company works in the kernel development community rate that company's performance and make suggestions for what companies can do to improve their support for the kernel development community as a whole it's in the kernel source or it's at the URL that I put there at the bottom I would encourage everybody to have a look at it think about honestly where their company stands in the spectrum that we've laid out there and what could be done to move to the higher levels yes if we can all do a little bit better at supporting the maintenance process then we will have a much better kernel that will last us for the next 30 years and beyond that Scott McNeely for all is false what made the point that open source is free like a puppy is free all right and you know he was not wrong there it's really easy to bring a puppy into the house but if you don't then pay attention to it and train it and so on you're going to have big messes on the floor and your shoes will be chewed up right the same thing will happen with open source software right if you don't pay attention to the maintenance of it you're going to have a mess to deal with in the future so I would invite us all as we're talking the next few days to think about how can we take better care of this particular puppy and all the puppies that we have adopted over the years thank you John you know as you were talking I was looking back at some of the team Nirov is in here from the Linux Foundation who's working on some of our analytics tools to answer some of the requests that you have one of the things that we could use maintainers and we've asked a lot of maintainers to help with this is help us gather data so that we can convince your employers to give you more time let me give you one example we're building a tool that can show in a project what were you in a subsystem what developers are working inside or outside of normal business working hours and the data shows that you're all either vampires or you're doing it outside of your work hours because you got so much work to do if I can go show that to someone who heads an Ospo a head of engineering and a large company you might find we can convince them to give you all a little more time and that's just one of many examples based on well I agree with everything John said so let's continue the conversation not just with the kernel community both all our project maintainers to give that help so I really appreciate all of your talks John our next speaker is from the Apache Software Foundation someone who I've known for a long time recently had a conversation with at our open source foundation summit in Geneva and we both agreed that we're getting older hopefully wiser Dave Nally also works at Amazon Web Services he's the head of open source strategy and marketing there and today he's going to talk to us about topics that are a challenge in open source around policy and security and so forth please welcome Dave Nally so the first thing I want to tell you today is that open source has won over the course of several decades open source has transformed from something that was primarily people sharing patches via email to solve problems that they all commonly had to become the default development methodology people use for software development today perhaps unintentionally it's also become a market definer and you know it's notable that once an open source project reaches a certain critical mass that proprietary software regardless of how well funded can not compete with it there's example after example of this we could talk about the Linux kernel and those proprietary unices are all but a fading memory we could talk about Kubernetes which has become the default way in a very short period of time that people manage schedule and allocate resources at scale I always knew that we were winning though you know benefit of hindsight here I knew that open source was ubiquitous I knew it was everywhere and then something happened to make me really understand what ubiquity meant in open source it was log for shell and I went from understanding that open source was winning in software to open sources winning in refrigerators and phlebotomy machines open source truly is everywhere of course not everything is a big named project like Kubernetes or Linux or PyTorch a lot of what we're actually doing is building small components and libraries that are able to be reused Synopsis who makes a software composition analysis tool called Black Duck said that in their 2023 report that of all of the code bases that they saw 96% contained open source I don't know what the 96% number represents to you to me my first reaction is that Black Duck may have 3% or so of margin for error here because it feels like open source is far more pervasive than just 96% but buried more deeply in that report was an even more interesting number and that number was the percentage of code bases as measured by lines of code that represented open source in the code base and the average code base that they saw contained 75% open source so 75% of the average code base that you come across is open source I'm not really here though to talk about the fact that we won or even how we won I'm here to talk about a couple of the problems that winning has presented us with and these are just two because I have a relatively limited amount of time to talk with you today but the first is something that really isn't new but does have a lot of increased activity over the past 12 to 18 months and it's something that for many of us perhaps it's the first time in our lives that people want to be just like us or at least like our software and so they have taken advantage of calling their software or their other technology elements open source and we heard Brian talk about this a little bit earlier this is a big problem because it's going to cause problems for our users over the long term they're not going to understand the freedoms that we expect them to have when we call something open source we even have folks wanting to take the halo effect that open source software has earned and apply it to brand new technologies like open source AI I'm still not sure what open source AI might look like is that open source data open source transformers are the models themselves open source I don't know I'm glad there are people who are working on that actively but I think it is important for us to hold fast on that definition for no other reason than to benefit the users who make who make use of the software and other technology that we produce they should have the same freedoms that we are used to when we are talking about open source software but the second problem that we have is that the entire world has realized that open source is everywhere we have governments now who want to regulate us some of those are well-intentioned some of them much less so and so we're now entering into a phase where open source is doing amazing innovative work critical work it is the foundation that we build upon 75 percent of the average code base is open source and we are now having folks who also realize that and instead of hearing wow it's amazing how innovative you are they're saying yes we realize how important you are and we realize we must regulate you these are new challenges they're not technology challenges and I didn't come here to whine about problems that we're facing due to our success I came to member summit to talk about some of the problems that I see because this is where leaders come leaders in companies leaders in open source organizations I often joke with Jim that member summit is the Davos of open source and so I'm coming here and my plea to you today is as you're going back to your organizations and companies think about what's going to happen in 2024 how are you going to help open source deal with the sustainability problems that John was talking about earlier how are you going to help us deal with the fact that folks want to overload the definition of what open source is how are you going to help us deal with the fact that governments are seeking to regulate us in ways that would change the very existence of open source I don't have solutions for these today but I know that this particular audience has the capability to work on it thank you so much what is it with great power comes great responsibility is the other cliche and I think your talk is as well stated Dave our last speaker we're going to circle back to the topic of AI for and specifically talking about responsible AI the speaker has had a decade of service in the Austrian public sector Katerina relocated to Silicon Valley and has since focused her career on tech policy privacy security and regulation she's the founder of the AI education network please welcome to the stage Katerina Kerner hello everybody thank you so much for having me yeah it I'm gonna address a complex topic as we have heard there are a couple of answers to the questions that have not been answered yet and this is working so it is indeed challenging to talk about responsible AI in the context of open source as our starting point is this very complex and conceptually difficult relationship between open source and AI the open source definition as you all know best has been super robust for decades and today with this rapid growth of AI machine learning the definition faces new challenges AI machine learning involves various components and we have to explore how the open source definition as we know it can adapt or include those new elements and then also of course as you have addressed in your talk earlier the public release of AI ML components such as LLMs with Meta's leading example of releasing Lama has contributed to this conflation of the term open source sharing the trained model but not sharing training data or the code used for the for training so but what is responsible AI I think it's not I mean the term responsible AI ethically I trustworthy AI is kind of omnipresent at least in my bubble but it can sometimes seem as a still a fluffy term but actually it's not it has a pretty clear profile by now responsible AI is a set of good governance guidelines that are composed of a set of common principles these usually include privacy data governance accountability auditability robustness security transparency explainability fairness human oversight and promotion of human values or the alignment issue and there are many sources for those responsible principles but nevertheless the overlap in in those principles so we have for example the UNESCO has defined them the Council of Europe the OECD the European Commission we have countless self-regulatory guidelines within companies Microsoft Facebook Google Salesforce they all have really great and also very useful guidelines to download and with playbooks how to operationalize them we have the partnership on AI so we have the IEEE list we have like so many organizations which focus on those principles right now in this context we also I also want to mention the upcoming EU AI Act it's currently in the last phase of negotiation and it could be passed early next year and it will take 18 to 24 months to get into force and I mentioned the EU AI Act because it also evolves or incorporates those responsible AI principles so there is a kind of overlap and going in the same direction and the EU AI Act will as had the GDPR have extra territorial effect so it will be super relevant also for the US it will of course it aims at protecting fundamental rights it will classify AI systems in different risk categories with different requirements and what we see here I just wonder you know my notes on how to scroll down I cannot really scroll down so so and the EU AI Act in the current version of the European Parliament because there are currently three draft versions that are negotiated between the European institutions thank you so much someone scroll down for me so it the European Parliament luckily enough introduced the exception for open source licenses so the regulation shall not apply to AI components provided under free and open source licenses except when it's part of a high-risk AI system or a foundation model we also see that in this recitals so recitals are more like interpretation tools for the regulation itself we see though that they mentioned that developers shall nevertheless even though AI components might be free and open source whatever that exactly means is also not super clear yet so we have the same problem here maybe incorporate it into law if we don't pay attention that developers also should take care of documentation practices such as model cards to really pass this information transparency on along the EU AI value chain and also as soon as you charge a price which you see in the middle except for those exceptions in the middle you also are not in the open in this exception anymore and there is a recording of a presentation I gave on the EU AI act and open source on the OC website so if you want to learn more about this you could look up that presentation that I gave last month of course talking about the intersection of responsible AI and open source it is super important to note there exist so many open source resources to do responsible AI better so we have responsible AI toolkits and frameworks such as IBM explainability model card toolkit Microsoft responsible tool box etc so I really want to stress that this is again an example how the open source community contributed significantly to also the responsible AI ecosystem and operationalization but we also have some great project not only addressing open source for responsible AI but responsible AI in open source so I collected some examples Mozilla has a working group on trustworthy AI the Linux foundation addresses open source governance for ethical AI responsible AI licenses are a big topic in development and of course I'm hugging face has this great expert team working on responsible AI and has a very very very good newsletter that addresses these topics which I can really recommend signing up of course there's still some challenges so some examples of those is governance in accountability so I mean responsible AI is basically all about governance and while the Linux foundation does place a strong emphasis on governance guidelines we do know or you know for sure way better than I do that there are open source initiatives that grapple with governance that manifests in various ways some projects have no former governance or there are others have some form of governance or a talk governance anyways if those governance issues are not addressed they can have far-reaching consequences including that we have a lack of adherence to responsible AI principles within those very open source communities and downstream and that could become a problem I think if this is not addressed because it could become a bad example and for regulators to maybe look at this closer then we have the topic of bias mitigation also I can really recommend this specific blog post on the hacking phase website bias is extremely complex multifaceted issue with no single one fits all solution it's not only technological it's also related to the broader social cultural historical context like who is in the team who is in it what's in the dataset it reflects in all of those things and yeah so this this blog post I can really recommend because it lists very very very concrete tools and approaches and techniques to address bias throughout the AI life cycle among them this very classic by now and very successful initiative a couple of years ago of model cards and I will get to that a little bit later and I have 11 minutes still okay and the third topic I wanted to you know highlight and of course cannot like tackle in depth because this is huge is also huge and responsible I is the topic of security we have a lot of development also in this space for example with the cyber security and infrastructure security agency recently released a guidance on how to improve open source software safety or we have tools like Google's Dev API which provides free access to the data that powers the website and can point out dependencies and just this is a complex topic it's like I can only scratch the surface and I just want to give some pointers that if you are interested in one or the other aspect that you can it's maybe easier to you know have an overview and dive deeper in whatever interests you most so as we anticipate the EU AI Act I think it's essential to recognize this responsible AI and upcoming regulations will likely impact all stakeholders in the ecosystem so it will really be a riffle effect or it will have like an effect as if you throw a stone into the water so we will do not know yet what exactly it will mean for all of us or for the open source ecosystem or what open source AI will mean and what components are exactly so it's all very it's a lot of questions not so many answers but I think nevertheless this regulation will come responsible AI is totally on the table so it is really necessary or best to strengthen best practices open source AI documentation and also focus on transparency because that can help downstream for example by using model cards to communicate AI model details and ethical considerations and I also put together this slide which with some ideas you it would be great to discuss those because I would really like to flesh this out a little bit better even but what I already can list here is and recommendations that already came from the other sources I mentioned is to establish and promote AI ethics advisory group or a working group within your project and also put together ethically deployment guidelines and usually I talk a lot about AI governance in companies organizational governance or ask you know I made a study on AI governance in organizations and I mean they're all very much at the beginning with AI governance in general and a recommendation how to start this is usually just start small and just start somehow so just get together some people are interested in it discuss which principles are relevant or important for you and for your project and take a concrete example and discussing how those principles would apply to this very to this project so this is really the recommendation of all like you know the people I talked to in bigger organizations such as IBM which is great AI governance and I think that's we shouldn't make it too complicated as from the start just really get together people that's also the second aspect here because bias is not only in data sets or computational bias but also it's about diverse contributions so a diverse team is always best to have there are also a fairness toolkits that are available for use then transparency measures for example using monitoring tools or model cards is a big big topic in responsible AI then there is I came across the tip to come up with package managers for accountability so so pre-deployment that they can really be the the person who is knows exactly what is going on and make sure that all those aspects were addressed before before getting into deployment and then I came across the idea of creators for open source so either monetized open source they will then deploy those tools I think that's a very good way to go and last and not least the various programs to promote responsible AI principles in open source for example the secure open source rewards program the open SSF and I think every project where you can come up with contributions and engagement of the community is great for this really important topic and last and not least in this context I also want I want to conclude with if we need new licenses and how that will look but we will also have a panel on this tomorrow so I'm looking forward to discussing that tomorrow there's the open and responsible I license initiative that a license is open access usage and sharing of AI artifacts by promoting responsible use I don't know if this is going to be picked up but I mean how much this will grow or not they make the comparison that open source software licenses applied to code and credit comments to general content and those open rail licenses should apply to responsible open source AI as well to empower the community with with tools to really be transparent about their initiatives for open and responsible machine learning I hope we can discuss this further if you have anything where you want to correct me or help me learn more that would be great because I'm also I really love to educate others while I'm learning myself so anything that you will tell me I can then tell others and that's that would be wonderful so thank you very much for your attention thank you now we've definitely seen some calls out there for stopping open source in foundation models a lot of those arguments tend to be very generic and vague some day these models will create a super bio weapon or they'll be used for some nefarious purpose I'm not buying it I think that in order to have transparency trust attribution you have to have open source foundation models so that we can figure out how these things work and build systems like we just heard about