 Other than that, commercial's over. Our next speaker is Joel Orlina, who is sharing Maven's security journey. Joel is an engineering manager at Sonatype, who just happens to be a sponsor. Go check out their booth, where he helps with the care and feeding of the Maven central. While also contributing to product development, please join me in welcoming Joel. Take it away, sir. Apologize a little bit for not having the displays on the right screen. I think that's right. Yeah, well, I'd like to have the speaker view on my main screen here. And yeah, yeah, not that secret, apparently. So thanks for the intro, Crobe. My name's Joel Orlina. I'm an engineering manager at Sonatype. I've been with Sonatype since 2010. And as an engineering manager, I support multiple teams, one of them dedicated to not just the continue operations of legacy services around Maven Central, but also to some of the future plans, some of the new services that we're building to modernize and improve people's interactions with what is the single largest repository of open source components for languages that target the JVM. I used to say that Maven Central was primarily for Java developers, but I think the truth is that if you develop for S in Scala or in Kotlin, all these things target the JVM closure, there are components in Maven Central for all of those languages. So I'm gonna start with some definitions maybe, a little bit of storytelling around what I think of when I hear Maven Central. When I first gave this talk, it was for a developer focused conference, primarily for Java developers. I doubt that we have a majority of Java developers here, so I hope that some of the story resonates and gives you a little bit larger background into what software package repository like Maven is, what it consists of, and what it takes to run it. And to help tell that story after the definitions, I have some high level architecture. I won't get into the nuts and bolts. I'm here at the coffee break. I'll be here for the next few days. Feel free to come by me if you're actually interested in seeing more of the nuts and bolts. When I first gave this talk, I actually had more time, so I may actually skip Central by the numbers, a little bit of statistics around growth, around its contents, and I'm more than happy to share that as well. Maybe if we have time available, I might come back to them. What I really want to get to, though, is a little bit of operational description. I think I've seen many presentations today where people go back into ancient history. I'm gonna go back to 2021, which was a particularly eventful year for, I think, not just everybody here, certainly for the team at Sonatype maintaining Maven Central. We'll go take a look at certain events that punctuated that year and actually had security ramifications for the way we operate Central for our users and for us as maintainers. And the last few slides are gonna focus on the future. I actually have a slide with a prototype of the next generation of the publisher portal that we're actively building. There's a slide talking about our involvement with OpenSSF, which has been incredibly fruitful to date, but I'll get to that at the end, and with locked time allowing, I'll take a few questions before letting everyone go off to get coffee. I feel like I'm in the unenviable position, not just of blocking people from caffeine, but following Mr. David Wheeler. Thank you, that was Fabi's presentation, and I hope to keep the energy going. All right, what is Maven Central? So when I get this talking from Java developers, I have to ask them to indulge me a little bit. They already know what it is. So this illustration, it's actually one of many in a Wikipedia article. If you go on Wikipedia and search for blind man and an elephant, you'll turn up this great article about a parable from the Indian subcontinent. It's very old, the first, I think written evidence of it is from 500 BC, but purportedly it's much older than that. And it tells the story about five, six blind men who for whatever reason are asked to describe an elephant. One of them comes up, manages to wrap his arms around the leg and says, oh, it's a mighty tree trunk. The other one grabs the tail and says, oh, it's long, like a snake. The other one, I think, grabs the tusk. Oh, it's pointy, it must be a spear. Another one, the ear. Oh, it's a giant fan. Someone in the back says it's a great wall. And Java developers actually don't think about Maven Central at all. The original abstract of this talk was Maven Central. It's like the stars, it's like electricity. You don't think about it because it's there. If it's dark and you look up, you see the stars. Yeah, that's what Maven Central is, something that operates all the time without our knowing. But I like the support, I have, as part of my job, supporting the community that uses Maven Central and I get support requests that actually remind me that depending on how people use Maven Central, they have a very different experience of it, much like Six Mind Men in an Elephant. For the most part, developers, those of you here who are Java developers will happily type MVN clean install. After Maven is done installing and downloading the internet, you have a jar file that you can actually send downstream. That is the bulk of the experience of Maven Central, but there's a large community of publishers, the people who actually are responsible for putting those great open source bits out there and they have a very different experience as well and part of the team responsibilities at Maven Central at Sonatype are for that publisher community. We also offer service and you'll see this on the architecture slide where we focus on how people can do a little bit of research on those components, component metadata, age, and data that's in the POM, the project object model and that's also a service that we support. It's very different from the other experiences of people using Maven Central and that's where most of the overlap between this illustration and Maven Central actually lies. If you read the article, you'll actually see that the parable kind of falls apart in terms of applicability of Maven Central. The blind man actually, in many cases, get into an argument. They feel like their tiny appreciation of the elephant is the only truth and they come to blows in many of the retellings. That I don't think has actually happened in the Java community, but I really enjoy the parable of the blind man and an elephant because I'm not a blind man. I'm actually someone who has known this elephant for the past 12 years and has been tasked with taking care of it. This illustration is, I think, the most chaotic one in the article and it's my favorite. I mean, look, there's all this stuff going on and I think this accurately captures many points of the past 12 years for me and what have I learned from taking care of an elephant? It's very, very large. It is very, very expensive to house, to feed, to clean. When it becomes unruly, it is very difficult to calm it down and make sure it doesn't hurt itself and the people around it. When we get into the 2021 year review, I hope that some of these analogies become a bit more clear, but this is absolutely my favorite picture of the blind man and an elephant because it really captures some of my days. I think there's one person in there who's about to die and I'm like, that's me, actually. Not today, but some days. All right, very high level architecture and apologize if this becomes a little bit too Java heavy. I promise I'll try and keep it high level for the audience here and if you want details, come find me. I mentioned how we have different people with different experiences. I've categorized them on the left-hand side. Publishers, repo users. These are the people who actually access repoone.maven.org. That is the host name for Maven Central and then there are users of search. These are people who are doing more of the research style in more component metadata type lookups. But publishers actually interact with a sonotype product. Several instances of them. I'll actually get to how we got to several instances in the year in review. We use Nexus Repository Manager. It is, you know, there is a caching proxy but there's actually functionality in the professional version that we use that ensures a certain level of quality for the components before you can actually publish them. Are your components well-formed? Do you have a complete set of metadata around them? Are you providing sources in Java docs? Do you have PGP signatures? That is all guaranteed by the publication software. This is actually the area of the Maven ecosystem that actually where I, you know, receive and my team receive the most interaction with customers because we raise a non-trivial bar for people to vault over before they can even publish. We're not letting you in unless you can actually meet these requirements. And you'll see in the year in the life that this has become, you know, a source of, you know, extra effort for us but it's effort worth paying because it involves, you know, it ensures some baseline quality inside the Java ecosystem. Let's see, traveling counterclockwise, you know, you'll see we have our ubiquitous Jenkins icon. We use a posted instance of Jenkins actually for a lot of orchestration. We probably misuse it. But at the end of a publishing, you know, activity, the bits end up in an AWS S3 bucket. And S3 can be easily hooked up as the origin server to any one of a number of CDNs. The CDN that Maven Central relies on is Fastly. Fastly should be a name that's known to practically every person who works on a package repository. It is, you know, top tier global CDN. We've actually been with them, I guess, since very close to their original founding, they've been an exceptional partner to Sonata type. And they've grown quite rapidly but also quite stably. The point I wanted to make about this particular section of the graph is that, you know, Maven Central is relied on by millions of Java developers. And, you know, depending on how, you know, their build practices are set up, you know, if Maven say, if you cannot download a dependency for your build, your software does not go out. In fact, software does not get published. Other things start to fail. And so the choice we've made to actually to go with Fastly and then back Fastly with something like AWS S3 is about going with something that is internet scale. You know, people who are running a package repository, having the back of their minds, you know, what do we do? What do we do if the repo goes down? What do we do if, you know, there's some sort of event that takes one piece of the infrastructure down? Relying on both AWS and Fastly means that we don't have to worry about that. If S3 goes down, if Fastly goes down, the rest of the internet is down with us. You've got bigger problems than your build from Maven not completing. So the repo users here are the immediate benefits you interact with Fastly. This is where MVN clean install sort of hits to actually, you know, get your build to go. From the same S3 bucket, we have a little bit more orchestration. I believe this is the AWS architecture icon for Elastic Beanstalk, search.maven.org, which I'm not sure how many people are familiar with, uses an index built from the same bits inside the repo origin server and Elastic Beanstalk serves up the front end. And I'm not sure I'm gonna go into too much more detail on search, we might talk a little bit about the functionality when we get to the forward-looking slides. All right, central by the numbers, I usually spend a lot of time on these. It's an elephant, it's big. If we have time, we'll come back to it. 2021, some ancient history. 2021 was interesting for all of us. I think everyone, you know, if you read ahead or go read to the right side, you'll actually see I'm gonna spend some time talking about the log for shell response. But we had great plans for Maven Central starting with the team. I feel like, you know, when I started at Sonotype, Maven Central was something that, you know, our CTO, co-founder, Brian Fox, said, in between the things I need you to do normally, you should take a look at this repo one thing. I've got a few things that I can't tend to now. And here we are today, right, and Maven Central is now, you know, larger than ever and more important than ever. Well, in 2021, I think that, you know, Sonotype got to the realization that the one, maybe one and a half people who were asked to keep the lights on in between their day job certainly wasn't sufficient. So we actually brought on two DevOps engineers to improve stability across some of the services I showed you in the high-level architecture. At the same time, Sonotype was undergoing company-wide adoption of scrum as an agile methodology. And we actually did a whole mess of backlog grooming and planning, and on February 2, we kicked off Sprint One for Central Team. Well, on February 3 was the announcement that Bintray was shutting down, and this is something that is definitely more in the top of mind for Java developers, but just so for the people who aren't Java developers, Bintray is essentially a competing open-source package repo with a superset of components, so you know, more components from other repositories than just Maven Central. So they actually mirrored Maven Central and then provided an alternate means for people to publish to Maven Central. Well, on February 3, they decided they were getting out of running their repo and that they were going to shut down. Their initial press releases weren't extremely crisp about what the migration plan was for people, so in the first few days after February 3, there were a lot of questions around the community like, what do you mean you're shutting down? Where are people gonna get my components now? Where can we go? And we, Sonotype, said that, well, we have an open-source Java package repository. You should feel free to publish to us. Here are our instructions. Here is the public JIRA project where you can sign up and publish. That started a giant chain of events. I have the link here. The announcement is still up on the internet and the update says that the actual repo is still live and running. It's read-only, but they continue to serve their artifacts, which is great. I'm actually quite relieved that that's the position they ended up in, but that's as of right now. Back in February of 2021, there was a lot less clarity. And I'm gonna highlight these two charts. The one on the left, I wanna call out the February to March giant stair step. That represents a, I think, 22, 23% increase in the bandwidth that users of Maven Central started consuming. We had always known that there was a significant amount of people who were consuming exclusively from Bintree versus us, but we never could quantify it. But with the announcement, Bintree was shutting down, people migrating over to Maven Central as the place where they could officially consume their artifacts, we saw this increase. And the good news is that, thanks to Fastly being this capable partner and being mindful of how important their continuing to serve artifacts are, we had no issues shouldering this load. So this is sort of the, from the perspective of the machinery of the internet, right? Between Fastly and people running MVN, Clean Astall, as long as they were able to find their artifacts on Maven Central, there was really no blip in terms of the quality of service with the Bintree shutdown. This other chart, I hope, captures the human toll of it all. So support for services related to Maven Central come through Ajira, the Sonotype Runs, issues.sonotype.org. And we normally expect January to be a bit depressed as people are still out on holiday. But what we did not expect was this giant leap, in February, March, and then a leap upward again in April, and this was all related to activity around the Bintree shutdown. First off, publishers who knew that the shutdown was coming and very quickly realized what it meant to them started signing up in droves on our services. So this leap is, you know, the blue chart is a specific issue type called New Project. And we saw the new project numbers jump up entirely. You know, it really looks like it doubled, right? In January, sustained in February, through March and then up again in April. This other color represents publishing support. These are things that aren't just I'm signing up for a new project. And I'll explain this a little bit by going back to something that we thought about when we saw the announcement that might actually be very germane to some open SSF topics that, you know, that folks here may have already heard about. So in the Maven ecosystem, we uniquely identify components with three coordinates. You know, we don't just have a name or Maven speak an artifact ID in a version. We actually have a namespace that is an umbrella for all and which allows you to sort of represent your organization, a group ID. If you've ever published a Maven, one of the more annoying things, probably one of the more necessary ones, is to claim your namespace with some sort of proof that you are actually serious about publishing. And these days, we ask people one to make sure that their group ID reflects a domain they own or at least control the content for. So com.jorlena, I would have to say, well, I own jorlena.com and I would have to submit to our automated process a DNS TXT record with some sort of record ID to prove that I own that domain. Bintree, publishing to Bintree did not have any such mechanism. I'm still fuzzy as to what they accepted as proof of organization to claim a namespace. But one of the first things, and Brian's still there, I think it was Brian Fox who came and said to me, he's like, well, Bintree's been around for a few years. They don't have the same validation process we do. Wouldn't it be possible for someone who knows that something only exists on Bintree to buy a domain that is not owned by someone on Bintree and then sign up on Maven Central? If they were to have done that, our automation would have granted them access to a namespace they did not own. And more importantly, to a namespace that already had provenance somewhere else and was mature and in use by somewhere else. So this jump in human activity represents what we did at Sonata to actually turn off that automation. We actually broke our own process for several days to think about what is the right way to honor this ownership from Bintree to ensure a safe migration to a safe landing spot that is Maven Central. Ultimately, it involved a few tweaks to the automation whereby we would check whether the project namespace existed on Maven Central already. And then we would check if it existed already on Bintree. And if it did exist on Bintree, we essentially threw it into another manual workflow where we said, it looks like you're trying to publish on Bintree. Well, if that's the case, we'd like for you to acknowledge that and then to follow this process to get yourself signed up. I won't linger too more on this slide, but I think we measured over the next three months how many projects we migrated from Bintree, either from people successfully signing up on their own or for us asking them to migrate. And we counted about 700 projects in those three months. And a lot of that was handled by human effort because we wanted to be sure that we were doing the right thing and purposely turned off the automation to make sure that we understood the scope of the problem and to make sure we had a viable and equitable approach to a solution. Before I go all the way into the middle of the year, I guess the one last thing about the Bintree shutdown that I like to call out is that when everybody decided that they needed to publish their artifacts anew on Maven Central, they had to publish them to one of our Nexus Repository Manager servers, we actually kind of got DOSed a little bit. We did not realize that the popularity of Bintree would turn into all these new requests. So between February 3 and February 25, we built up documentation, stood up new resources and essentially built a new server with zero tenants on it. And on February 25, we switched our process where by default we would provision people on the new and unloaded host. Leading up to February 25, we had continuous complaints of people saying, my builds are failing, they're timing out. And we had to unfortunately explain to them, yes, it's because you and everybody on Bintree needs a new home and we did not realize that we were going to need to scale this quickly. And so keep this in the back of your heads. We're gonna revisit this at the end of 2021. All right, so I actually, I'm really glad that I, in retrospect now, followed the David Wheeler talk because of the question about S-Bomb. This is something that we turned on in mid-May of last year where when you publish to Maven Central, when you sign up, you are automatically opted in to a workflow that will calculate an S-Bomb, not a perfectly complete one, but one of the other open source components in your build that are sourced from Maven Central. We will then send you an email after the successful release of your build on our servers with a summary of what's vulnerable and other weaknesses and consistencies, potential license threats inside that minimal S-Bomb. And if you click the link, you get a detailed report. So when David said the S-Bomb, not the silver bullet, right? Just because it says what's in there doesn't mean you know what to do with it. We're trying to introduce a little bit of silver here where look, we don't know where the fixes are yet, but we know that you are consuming this version that's vulnerable, this version that has a license, it's maybe a little bit copy left-ish. You might want to take a look at it. And this is free for anyone who publishes to Maven Central through O-1 or so, and we'll continue to be free. You know, and for as long as we run, you know, a publishing stack for Maven Central. All right, I think I'm still on time here. Here we go, it's the end of 2021. People are getting ready for holiday and guess what? No one's gonna take a holiday because the internet's about to break in a horrible way. So I feel like it takes me some effort now to remember what happened at the end of last year. So bear with me while I try to tease everything apart. First off, we had to upgrade all of our software. As soon as the Apache Software Foundation, you know, cleared up which versions were not vulnerable, we made sure that any of the running services at Maven Central were in fact upgraded to use patch versions of Log4J. That didn't take too much time. What ended up consuming all the time was realizing that even though we had, what, 10 months to have people move to our new and unloaded server, it is amazing that all the people who were on the old server decided we have to upgrade all of our dependencies as well. We all have to publish new versions of our software. So the same timeouts, the same failures, publishers actually all had to deal with it in February, we dealt with all over again in December. And this time it felt much worse. I feel like oss.sonotype.org, which is the main host that we published to, was under water for three straight days. And we actually had to get volunteers from other development DevOps and SRE teams at Sonotype to essentially man our Jira and say, hey, we saw that having trouble publishing to oss.sonotype.org, would you like to migrate to the new host? And yeah, I remember five people showing up to a Zoom call on Saturday morning, am I having to tell them this is just in case we have more people to migrate? But we did it, we got people move to the new host and thankfully we had learned the lessons from February on how to quickly scale and what process to follow to move them. The slides themselves don't illustrate any of that. These are actually from our Log4j vulnerability resource center where we actually continue to calculate stats. There were still vulnerable versions of Log4j being downloaded, not the fault of the Log4j project. It's just people still haven't gotten the message and upgraded. All right, I think I have five minutes left to quickly go through the future direction. I mentioned that Central Team is actually actively building for the future. We've got, I believe, two front-end, two back-end developers and a tech lead all trying to build the new central portal that is going to be what we hope is a centralized place, not just for consuming metadata about the artifacts, but also publishing. To that end, we'll need to build a better sign-up and identity management process. This is a thing that we care so strongly about, especially in light of all the issues that we've seen recently. Like, do you really belong to this project? Should you have access to this group ID? So organizational identity management is definitely top of mind. We are also trying to launch new bits of data products, including component popularity and categorization. We're still toying with how popularity is defined. Is it raw downloads? Is it the liveness of the project? We have a research and analytics team at Sonosite that has categorized many of the most popular open-source products, and the screenshots illustrate that. But before I go to the screen, I do want to just shout out to OpenSSF and the securing software repositories working group. I've had the pleasure of attending a couple of those meetings now. I believe they're every two weeks. They alternate between the EMEA friendly and the APAC friendly, but I stay up late to hang out on the APAC call. And they've all been terribly welcoming. And it's been wonderful to sort of build empathy with them. These problems that we saw in 2021 aren't unique to us. There are certain things that we were better prepared for, but we are all learning from each other. We are all building for a future where we all share the same responsibilities and with luck leverage the same solutions. S-bombs and all the tooling that's available. So the last thing I have here is an actual screenshot from our staging server. And it has a section for most popular packages in the last 90 days, most popular namespaces, popular categories. And yeah, so far it looks a lot like search.mayman.org, look up things by group ID, artifact ID inversion, see some details on recent releases. But this will eventually be that one-stop shop where you can consume information about the artifacts and also publish new information, publish your new artifacts and actually see that transaction log of all your artifacts. I'm supposed to mention, Sigstore is going to play a huge, huge deal in this publishing process. Right now we require PGP signatures, but it's all decentralized and the tooling around verification is something that we've never been great at. And it's super great to be working with everyone at Sigstore to actually have something centralized that we can all leverage across ecosystems. Slide in case people are interested in helping, I think all the presentations have the how you can help. If you would like to contribute to the future face and requirements for Central, we actually have a Google forum for you to sign up, central-beta.sonotype.org. And then, because I'm here, you know, and I have, I don't know, maybe a minute or two for one or two questions, okay. I'll take them now or outside of the coffee break, but I will leave on this slide, which is an upcoming Sonotype event. All day DevOps, it's a thing. There are folks here who are instrumental in building it. Yes, in the back, Mr. Wheeler. Well, it's amazing because that is actually one of the first things we looked at. It's like, you know, we did something less, you know, I guess less involved. We do ask them to add the DNS TXT record, but it's the same ID that's in the issue that they signed up. I actually feel that if we had more time to build in our legacy stack, we would have probably taken that track and, you know, been more ACME-like. But, you know, for the future, I absolutely believe that we'll need to leverage something that already exists. All right, I think I saw your hand on first and then we'll go to you. So the question is whether we plan to do any extra validation, not just ownership of the name, but the people who owned the name, maybe the trademark, right? I think that that's certainly on the table and that's always the question that comes up, is that, you know, great, you've just asked me to jump through another hoop. You know, well, what if my domain's expired, right? And so I'll share something with folks here that isn't apparent in our documentation. We've been fortunate in that, you know, we've come from an area of extremely, you know, high manual process historically. So if you were the first person to sign up for a namespace in Maven Central, you automatically become the person who is allowed to acknowledge that other people can sign up. We actually turn off the automated validation. So in cases where there's a long history where we kind of know who's actually signing it up, that served us really well, but I think that this is only gonna become more critical moving forward, is that getting at the root of identity, not just from domain ownership, but for true corporate identity or organizational identity, something we'll need to do more looking into, yes. They are already supported. It is actually the quickest path to publishing to Maven Central. Sign up for io.github.jorelena and we will actually ask you, create an empty repo with your OSSRHID and you will get automatically provisioned. We also support this for Bitbucket, GitLab, GitT for, you know, which is a Chinese repository and I think one more that escapes me right now. So yeah, you don't have to want to domain to publish Maven Central, just to have GitHub account. I do feel like there's lots of opportunity there, like we're already looking at that, using sort of whatever GitHub exposes in terms of organizational management and authorization. That may actually help us too. Verified by GitHub might be the thing where we're like, yeah, GitHub's verified you, we should probably trust that as well. Ladies and gentlemen, Joel Orlina. Thank you.