 Sorry, is that better? OK, cool. So most of the issues in the top 10 list are about how you actually construct the application. And so they're things like protecting against SQL injection attacks, protecting against cross-site scripting, all that sort of stuff. But there's one in particular that's not really related to the way you build the application itself. It's about the way you manage its dependencies. And that's point nine, using components with known software vulnerabilities. And this is an issue that basically means that update management becomes a key security concern. That's not a new problem, but it is a thing that becomes even more critical when you can't rely on perimeter defense. If you're running a public web service that's sitting on the internet, it can and will be attacked. And so I work on software supply and chain management for Red Hat. And update management is a thing that spans the entire supply chain. Publication, redistribution, deployment, these are all issues that matter in security management. It doesn't matter if a new release is available that solves a security bug if you haven't actually deployed that into your running service. Now, the traditional Linux approach to this is for people to try and build hardened bunkers. You minimize your attack service, patches needed, very, very carefully manage your updates such that a security fix doesn't break anything. This worked. This has served us well for a long time. But automation actually enables a new model for handling this problem. And that's the moving target. What we're able to do is regular discard our old instances of services and create new ones from scratch. And with continuous integration and automated testing, we can eagerly upgrade to new versions of our dependencies and design our testing and deployment systems accordingly. And there's actually a third option here which is to say, I'm not going to do either of these and make yourself a sitting duck in a tin shed. Don't do that. You need to do either one or the other, the hardened bunker or moving target. Now, Linux in the hardened bunker era, that's when Linux distros were really born. Time when publication of open source software meant posting a table on your website. Packages for the distros then took those tables, converted them to more manageable formats like RPM or Debs. And automated testing wasn't really a luxury, wasn't really something most upstream projects could afford because it required running your own servers, required huge investment automation. It just wasn't practical for most open source projects. If you were lucky, they had a test suite that developers themselves could run before they put the table up. And so what this meant was that regression testing, integration testing was left to the commercial redistributors. And hence, the long-term support model for Linux distros was born. And so that model was basically about providing low-risk security updates for a core set of critical components. Times change. And so the cost of running web services has dropped dramatically over the last 10 years. And so what this means is we're now seeing a whole pile of free-for-open-source projects continuous integration services, so things like Travis CI, GitLab CI, so on and so forth. And so what it means, it's quite common now for open source projects to actually be requiring their tests to pass not only before they do a release, but even before they accept a change into the source tree. And then in addition to this, most modern language communities actually have common publishing platforms that they publish to. So rather than publishing releases on their own personal websites, they're posting to the Python Package Index, they're posting to rubygems.org, they're posting to NPM. And so these improvements in automation mean that previously impractical things were impossible. Previously impractical things are now possible. So I know this title said Release Modeling New Dog, but for a moment we're going to talk about libraries.io. And libraries.io is a web service, freely available, that's developer-focused upstream monitoring. Now this was launched back in March 2015, and now monitors more than two million open source projects. That's a lot of software. And the way it achieves this is not by monitoring individual project websites, but by monitoring the publishing platforms. And so just by providing backends for a bit of around about 30, 34 publishing platforms, they found more than two million open source projects to be published. Now I'm going to compare that to some historical statistics. So one of the largest previous catalogs of open source software was Open Hub, previously called Olo. That's been around since 2006, tracks about 700,000 open source projects. Free Code, formerly FreshMeet, stopped getting updates in 2014. At that point it had been tracking around 50,000 projects. Debian Unlinux generally considered one of the largest Linux distributions, around since 1993, so coming up on 25 years, 50,000 projects. For the world of Unlinux, around since 2003, so coming up on its 15th birthday, 20,000 projects. So what we're seeing here is that, so libraries, I don't know, in two years, is already tracking more than three times as many as the largest previous catalog, and two orders of magnitude more than we'd find in a typical Linux distribution. So one of the big things here is that upstream monitoring has historically been an opt-in thing on the part of packages that you said, well, register to track the projects that you're interested in following, rather than just trying to track everything that's available on the web. And this extra step has actually means that lots of stuff is missing. And so Linux in the moving target era, it's a case of, well, the hardened bunkers aren't going away. There's always going to be hardened bunkers. You're not going to be updating a submarine live every six o'clock every day. But the majority of software is moving towards this moving target model. It's the case of, just because you've released it in the world doesn't mean it's no longer your problem. And so services like resin.io mean that even device manufacturers retain the ability to ship security updates, so on and so forth. So what this means for the Linux distributions is that our role changes. No longer are we just responsible for QA of our own stuff. What we need to be doing is we need to be asking, how do we help people QA what they're building on top of our platforms? And so that's where we get to release monitoring.org. So where libraries.io is focused specifically on developers and helping application developers know when their components are up to date, what really release monitoring.org focuses on is the challenges of redistributors. And the fact that as redistributors we need to know that components have been updated. But we also need to be able to integrate that into our downstream pipelines and our other tools. And so release monitoring.org maintains upstream downstream mappings and aims to feed into systematically automated update pipelines for distributions. So now we'll move on to actually showing you what the heck it is I'm talking about here, because that's all very fine and abstract. But ultimately, you need to get things working. Yes, so the bit that. Can you repeat the question? Oh, so the question was whether we automatically maintain the upstream downstream mappings. The answer right now is no. We get to the need for automation later. The need for more automation. We actually try and track as little as possible. And refer to the upstream wherever we can. I just need to find the right window here. So this is currently running locally, because LA, Trusting Conference Wi-Fi, seemed like a bad idea. And B, the registering new projects on FedMessage is easier to control the world when it's running locally. So this is basically what you'll see if you go to releasemonocking.org, details of what we're currently tracking, and information on how to connect to it. And one of the main things it shows is how to connect to it with FedMessage and start tracking it for updates. And so what we're looking for here is this is all about automation. And so in many ways, the interesting part of initia isn't so much the web service itself, as it is the ability to listen for messages from the service. So can people read that? So yeah, so we have a client there listening to the message bus. And if we go back to our web application over here, what we can do is, oh, it would help if I actually logged in, wouldn't it? Bring this over here. I'll do that. Sorry about that. Oh, this doesn't have my help. There we go. So once you logged in, so Fedora and various other open ID providers will let you log in. You can add a project here. Use requests as our example. And it's a Python package index project. So that automatically simplifies a lot of things. And so we can ask it to check for the latest release. And this will be a Fedora project. And so you're seeing some of the things of needs for improved automation here. And so yeah, so we can see it's registered that. It's gone and checked for the latest version. And as noted, there's a downstream mapping to Fedora here. And if we go over to our monitoring window, we can see that these various messages have come through. So this is FedMessageTail is one of the easiest ways to listen to the Fedora message bus. And so we can see here that because I'm running this locally, it's sending me a dev message rather than a production one. And it's telling a new project's been added. A new version update has been found. A new distro has been added because this was an empty instance. So Fedora is new. And we've got the mapping. We've got the new upstream downstream mapping. So if we actually go and look at what some of that actually looks like, you probably can't read that. So this is an example of monitoring for new upstream downstream mappings. And so this is telling us that we have an upstream downstream mapping for Fedora, for the python request project. And then over here, oh, did I just run the wrong command? Oh, yeah, no. So this is. And then in this window, we have an example of just listening for version updates. And this is the one that's particularly interesting for release monitoring. But you get the version updates. And it tells you various things about the new versions that have shown up. And then finally, one of the basic things we can do is just, rather than doing anything dynamic, we can just query directly for, please tell me about the downstreams of a given project. And so that's all the basic stuff. So release monitoring.org has a lot more data in it. But so yeah, so that's our basics of the operating service. So yeah, so the pieces we're talking about here. So init year is the web service itself. That's a Fedora infrastructure project online on GitHub. And that's basically a Flask SQL alchemy application. So currently running on Python 2. It does run under Python 3. That's supported for local development purposes. It'd be nice to get the production one to three, but that'll happen sometime. And so the key concepts in init year is it just tracks upstream projects. So the original publishes of a piece of open source software. And so they're uniquely identified by their homepage URL, which turns out to have a few problems which I'll talk about later. And so that covers both open source libraries. So components, targeting developers, building other larger applications. But it also covers full open source applications themselves. So things like LibreOffice and so on and so forth. Things that a developer isn't going to integrate as part of their application, but a distro may want to integrate as part of the distribution. One of the other key pieces is the monitoring backends. So the monitoring backends are the things that actually look for new releases. And the lowest common denominator back end here is that you just feed it a web page and say, here is the regex to look for new releases on this page. And so that's for the projects that are still publishing in the classic model of we have a project website, we put the tables here. If you want a new release, look for a new table. So something that's a relatively new addition to init year, like in the last six months or so. So for a very long time, init year didn't actually model things like the Python Package Index or RubyGems as an entity. What it did was it just modded, that was just a different kind of back end. So it turns out that with the publishing platform, you can actually make a few more interesting assertions, like any given name on PyPI will only belong to one project. So init year shouldn't allow multiple projects to be registered in the same back end. And so this is, yeah, so for things like Cpan, RubyGems, PyPI, NPM, they now get modeled as ecosystems and we can start applying more interesting rules about how those work. Downstream distributions, these are currently modeled really simply. They're basically just the name of the distribution. And so typically these are Linux distributions, but it turns out there's actually some other interesting things that we may be able to monitor. So like the CentOS container pipeline folks considering modeling themselves as a distribution, and so that will then let them get updates whenever one of their containers needs to be rebuilt because of a component update. So yeah, so the definition of a distribution in init year is actually quite flexible, and that's looking like it's going to prove to be useful for various things. And then, yeah, the upstream downstream mappings then are the... Yeah, yep. Not currently, so it's because this is... Oh, sorry. The question was whether the distributions themselves are currently versioned. Because this is mostly an update subscription mechanism, no, it's a case of we're interested in this upstream please tell us when it changes. And that's the main thing the upstream downstream mappings are useful. Yeah, there's some interesting questions around that, which we'll get to later in terms of possible future changes. So yeah, and so then the upstream downstream mappings are basically it would be nice if every distro had consistent naming schemes and we always followed them for everything. We don't. So yeah, naming schemes can get you a certain part of the way in terms of matching upstreams and downstreams, but ultimately what you need to do is build a database that says this is that. And so yeah, so that's actually one of the key pieces in init year is just having a central place to collaborate on building those mappings. And then finally, the piece that makes all of this interesting is the event notification system that we don't want people having to go say, oh, have they done a new release? Let's check their website. Oh, I should update my package. So the whole point of this is to get us out of that mode and get as much as we can automated. And so reviewers focus on reviewing publishers and say yes, we publish, we trust these people to produce good releases and then automate as much as we can beyond that. So who's heard of FedMessage or the FedoraMessage bus? Probably about half the room. So that's a much higher percentage than I got at Linux ConfairU. Yeah, so FedMessage, for folks that haven't heard of it or haven't looked into how it works, so it's a zero-mq-based brokerless messaging protocol. So there's a reference implementation in Python using Twisted, which is what Fedora mostly uses. So originally, this was short for Fedora Messaging. Once Debian started using it, that name seemed kind of inappropriate, so it has been rechristened as federated messaging. But yeah, and so if you find the old name anywhere, it should be changed. And so again, that lives in the Fedora Infrastructure Group on GitHub. And so one of the big things about FedMessage, if you look in a lot of messaging systems, they require a lot of centralized coordination and are very permission-based in terms of people getting onto the message bus. And the whole thing about FedMessage is that it is truly federated. It's like, if you need to ask for permission to do something, it's probably gone wrong somewhere. And so FedMessage just uses DNS for service discovery, so it's pretty much just... Here's the endpoint to connect to if you want to get the message stream. With that kind of open permissive system, you do need to worry about namespace collisions. And so again, this uses the traditional solution for that process, piggyback on the back of DNS, and essentially prefix your topics with a DNS name that you control. So Fedora puts most of its messages on Fedora Project, because releasemonitoring.org is actually run out of Fedora Infrastructure, it currently uses these. Case could be made that we should be using releasemonitoring.org instead, but question for the future. Again, with such an open publication system, the question of can you trust the messages comes up. So for a lot of systems, one of the things you can do is if you just use the message as a trigger, you can actually just go back to the original service for details. And then you're relying on SSL certificates for your authentication. But FedMessage does have source authentication built in directly, and you can do signed messages with either GPG or X509, and then turn that validation off. So messages are validated by default, and it will ignore every message that fails validation. You can turn that off for local development. And one final piece of FedMessage that's quite important. So FedMessage is brokenless by design because if you're doing simple messaging between components, adding a broker is just another piece that can fail. So by design, you don't need a broker. However, brokers are quite useful in many cases, whereby you can have, for example, many event sources hiding behind a single endpoint or have a single relay that listens to your event sources, and then your application just consumes that one relay. And so FedMessage relay fills that role, that it basically lets you, instead of everything having to listen to everything, relay basically lets you consolidate things at different points in your network. And so that's particularly relevant to initia because initia assumes it's listening to a relay rather than directly to other services, and assumes it's publishing to a relay for that matter. So that's what exists today. There's definitely some limitations to the current system, and so this is both where we're going to go anyway, but also where people could definitely get involved and help out in moving things along. So one of the biggest problems with the current system is it's a lot of it is quite manual and involves humans clicking buttons in user interfaces, which, as I said at the start, we have learned does not scale, particularly does not scale to the magnitude of modern open source development. And so what we're wanting to do is the API on the current service is entirely read-only. And so OpenID Connects will allow us to grant people access with their Fedora account system credentials, authorize automated clients, and essentially let us do things like fully populating all of the Fedora mappings. Well, actually get on to the next slide. Fully populate all the Fedora mappings, fully populate all the Debian mappings, try and get the number of projects tracked up to matching what we see in libraries.io. And basically it should be the case of if it exists, we can track it and we can report it. So we don't use the spec file metadata. We basically... Oh, sorry, the question was how do we do that based on the spec file metadata? The answer is we don't. What we do is we look at the upstreams, we look at their source artifacts, then you look at the downstream, and actually do artifact matching. No, no, no, no. So upstream is you basically scrape the web. And you do this for the publishing platforms. So you can quite straightforwardly go to PyPI and say, tell me about everything you know about. And libraries.io itself will actually be incredibly useful for this. But yes, it's a case of you automate things and then let your web scraper's go thing. And I hadn't thought about it, but something you actually could do with this is eventually use it to start looking at your spec files and say, what's still up to date? That would be really helpful. Yeah. Okay, yeah, so that's not currently in here, but yes, checking spec files and updating that kind of metadata is something that could definitely be done. So yeah, so currently the public data sets are missing the ecosystem details. We found some interesting problems recently where releasemodic.org had duplicate data in it and adding the ecosystem itself actually let us find and correct that. So yeah, so that will then let people query releasemodic.org for downstream details. Something that we're looking at actively is consuming the release feed from libraries.io. So that actually publishes a Firehose API as a server-side events stream. And Jeremy Klein recently wrote a SSE to FedMessage bridge that you can give it a server-side event generator to listen to, and it will then turn those into FedMessages. And so we're pretty sure we should be able to figure, get that set up and hence build a bridge into releasemodic.org based on that data, which will be really cool and will help with that discovery problem as well. Far more speculatively, and this gets back into the downstream version tracking and distro versioning. So at the moment, all Anitya Tracks is name mapping. So it says this upstream project is called this in this particular distribution. And that's incredibly useful in its own right because then you go into the distro systems to find out the downstream details. But one of the interesting things with this is that, well, you can potentially say, well, hang on, this version is available upstream. What are the various distro shipping? What are different versions of the distro shipping? So the big question mark here is not would that be interesting and useful, but should it be an Anitya? My own inclination is probably not. It should probably be something else that assumes the Anitya data. But at the same time, that's still the scope of what we can automate here and try and get the computers to do the heavy lifting for us is quite remarkable. And so hopefully I've convinced you that we do need to do that work, that open source has outgrown our ability to track it manually. But which is, to be honest, a frankly nice problem to have, like the idea 10 years ago, the idea that open source would have expanded beyond our ability to track it by hand. Probably not a problem a lot of people would have thought we'd have, but it has. And so we need to change the way we do things. Thank you all. Any questions? Yep. So how often do we scrape the upstream project pages? So by default, it checks every 24 hours. One of the things we're trying to make happen with the publishing platforms is most of those publish a feed of events, of new releases, and try and consume those and use that for both project discovery and for more prompt notification of new releases. Do we track security advisories separately from regular releases? No, we don't. Release monitoring doesn't incorporate anything CVE specific. So is this the GitHub back end in it? Yeah. So I'm not personally familiar with the GitHub, sorry, the question was whether and it is GitHub back end would be improved to use the GitHub API. So I think at the moment, I don't think we have the ability to do proper token management for using the GitHub API. So I suspect more likely we'll deal with that through the libraries.io feed because they're actually doing some really interesting work abstracting across GitHub, Bitbucket, and GitLab and basically building a common data model across those three hosting services. So yeah, so it's looking like we will be able to do a lot work with them and reuse their work rather than duplicating it. Any other questions? So yeah, so it becomes the case of you then... Oh, sorry, the question was how do you deal with cases where you're tracking the publication platform but the project stops updating the publication platform and just switches to only publishing through GitHub or whatever. So Release Monitoring Dog itself won't solve that problem because we're working on the thing of well, if they're using a publication platform that's where the users are expecting to find it. They'll keep publishing that way. The... Yeah, it's then a larger data mining problem to say, well, they've been publishing here we've stopped getting updates there. Have they moved to a different publication platform? Have they done this, that, or other? And you're basically getting to the point of doing meta-analysis on the metadata at that point. And it's like, yes, it can be automated. No Release Monitoring Dog won't do it. Yes. Yes, yeah. Oh, sorry, the question... the question coming out there was how do you deal with things like Forks and so Forks and Project Splits where you're tracking the original release but there's now additional components being published elsewhere. Yeah, this only deals with the... of things you already know about has a new release been published where it was expected to be. Yeah, there's all sorts of fun community tracking problems beyond that. There's some really interesting stuff in terms of tapping into things like the GitHub BigQuery data and looking at how is software being consumed and yeah, there's lots of cool possibilities for automation here where we... Okay, cool. Awesome, that's why we talk about it. Okay, thank you everybody. Time for one more? I didn't know core infrastructure were doing data gathering. So if they are, I'd be very interested to hear about it. Actually, I wanted to go to that talk but I was otherwise occupied. Okay, I will... Yes, I would be very interested in hearing more about that. Thank you all.