 Hello everybody, happy Monday and welcome to another OpenShift Commons briefing. And as we like to do on Mondays, we have an AMA with one of the many upstream projects. This today is Claire, which has been associated with Quay and Project Quay as well. But we're going to get an update on that project from three of the key folks from that team, and there's Alice, and Hank, Doné, and Luis De Los Santos is going to take us through what's going on in the Claire community and give us an update on the latest release. So ask your questions in the chat. We'll have live Q&A at the end and we'll try and answer as we go as well in the chat. So feel free to ask and post questions there. So without any further ado, Luis, take it away, introduce your cohorts and tell us all about what's been going on in the Claire community. Today, Hank, Aless and I will be presenting the recent work we've been doing to rejuvenate the Claire application and its community. I am a principal engineer on Claire, along with Hank, and Alex works on EXD Cloud in the PNT organization. In this talk, we're going to go over what Claire is, why you should care about Claire, how Claire works internally, how to move from Claire v2 to Claire v4, where Claire is being used today, the v4 one roadmap, how to contribute, and then we're going to end with an ask me anything. So what is Claire, right? Claire is a set of scalable services for container security. It can be used by both developers and operations to understand any vulnerabilities that might affect your container builds. It's open source and it is community developed. So why does Claire matter, right? As most of you know, we're moving to distributing applications in containers. Before this, it's more likely that you would deploy applications on servers, right? You could use Ansible or some kind of configuration management or your application there. If you had issues, vulnerable dependencies on those deployments, the splash damage was confined. They were confined to the servers you deployed it on. Now that we're shipping containers as the default way to package and deploy applications, we're now pushing those vulnerabilities to anywhere those containers run. This should become apparent that your organization's posture around container security becomes pretty important. There's also something to be said about the recent supply chain attacks of the SolarWinds hack. Obviously, when you're moving to a centralized repository where your artifacts go for deployment, that repository can become subject to attack, as you probably are aware of if you've been following any of that. So what's new in Claire before? So previously in Claire V2, the API was layer-based. You were responsible to handle the parent-child relationship of layers. That became a little bit cumbersome. So we moved to a manifest-focused API. As you can imagine, the manifest structure itself expresses that child-parent relationship of the layers, so the client no longer needs to. It's much more intuitive to deal with that schema versus having to deal with the layer pushes yourself. Claire V4 has content addressability as a first-class citizen. I put a little star there because it plays a key role in Claire V4 and it's a concept in the rest of this talk. I was introduced to content addressability in the storage domain, rather. What it is is it's a way to have a unique identifier always associate with a unique resource. I think a good way to put that in context is a checksum of some binary data. As long as the binary bits don't change, you can be sure that that checksum identifies the data at hand. Claire utilizes this and you'll learn more about this in a couple slides. We have completely updated all the security advisory data sources that we go out to. We moved away from doing any kind of ad hoc database parsing where we can't avoid it. That would be JSON dumps or YAML dumps. We now favor Oval as the standard and we're able to move most of the database or data security sources to Oval, which is nice because we're able to create some tooling around Oval that makes parsing pretty quick on our end. We now have a native CLI tool. We no longer depend on external repositories to provide a CLI tool for Claire. We've completely rewritten the notification subsystem. We also have baked language support into the data model from the start and we currently support Python. We've also completely redesigned Claire before in a microservice architecture with an emphasis on scale and performance. This allows operators to asymmetrically scale Claire, whether it's a push heavy day or whether it's a read heavy day, those two functionalities of Claire can be scaled asymmetrically and really conform to your performance characteristics and trends. How Claire works. This is the 30,000 foot view of what Claire does. There is a container. We'll use container and manifest somewhat interchangeably. Manifest is just our JSON schema. It represents a container and Claire is broken up into three services. The indexer, the matcher, and the notifier. The indexer is responsible for taking a manifest or a container and parsing out the contents. It's typical things like packages, repositories, what distribution those containers are representing. It places these contents in a report that we dubbed the index report. The matcher, on the other hand, is responsible for generating reports of lagged content. So when the matcher gets a request saying, hey, does this manifest have any vulnerabilities affecting it? It goes to the indexer. It says, hey, do you even have an index report for this manifest? If you do, can I have it? So when it receives it, it goes to its database and just sees that there's any flagged content packages, distributions, repositories. It'll take all that flagged contents and it'll place it into an ultimate vulnerability report and return that to the client. And the notifier just hangs out down there. He watches the matcher and the indexer, and he identifies new updates that come into the system and then asks the indexer, hey, do these vulnerabilities have just entered the system? Do they affect any manifests I care about? And if they do, they'll send a notification to any subscribed clients. So let's dig into indexing a little bit deeper. Indexing is a term we use for extracting the contents of a container. Take explanation of what a container is in the context of Clare before. A container can be viewed as a content-addressable hash with layers inside that hash. It's like a container object and then layers are content-addressable as well. The 604a hash here represents a high-level manifest and the subsequent layers you see are also content-addressable. The order of the layers themselves are dependent and, again, that is expressed in our new manifest API. So how does indexing actually work? First thing that happens is Clare needs to identify if the manifest should be scanned. When Clare gets a manifest submitted to it, it's going to check, hey, have I ever seen this content-addressable identifier, right? Have I ever seen this manifest hash before? In this case, hasn't. So it goes, okay, yes, you should be scanned. Let's move to the next phase. Next, it determines what layers it needs to actually scan. I express a common case here in that the base layer might have been seen a million times by Clare, right? This might be a UBI8 base layer. So it says, hey, I don't need to scan this. I already saw this. But the subsequent layers it has never seen before. So it's going to mark those as, yes, we need to scan these. I have no content available for this. When it actually goes to perform the scan, the base layer is simply just retrieved from the database. Any content we know about it is not, there's no work performed. It is just a get from the database. The subsequent layers are actually scanned. This is the process of going to fetch the layers, decompressing them, looking for package databases, identifying the characteristics of the individual layers. Finally, it coalesces all those results into a final index report. And it saves that. So now the next time it comes around, if it sees that, okay, the 604AC index manifest hash, if it sees it again, it's going to say, okay, yes, I have this data. We have already scanned this. By looking at the manifest hash, it can already know I scanned all those layers, right? Because if the layers changed, the manifest hash would change. Cool. So for the matching part, I'm actually going to hand it over to Hank, who's going to demonstrate this further. Hank, just give me a small cue when you want me to change the slides. We'll do. Okay, yeah. So now that we've got this index of what's in these contents and its content addressed, obviously, when, obviously, we don't need to redo the work because it won't change. But what does change over time is the vulnerabilities. There are new vulnerabilities discovered all the time. So that in this matching process is handling this more rapidly changing data or the data that changes it all. Yeah, so next slide. So the matching bit has two parts to it, the matchers and the updaters. The updaters feed data into the system and the matchers use the data, the updaters provide and sort of turn into our common format. Next slide. The matchers take this index report and all of them in parallel examine the index report for the sorts of data they know about. So here in this example, we have bits of code all at the same time, looking for REL packages, another one looking for Ubuntu packages, another looking for Alpine packages. Most of these won't find anything. And the ones that do return their results and we combine all of them together and return the matcher. Doing it this or return the report. Doing it this way means we can just sort of additively add bits to look at new parts. So for example, our language support is implemented as a box on this second row here and any sort of OS just automatically gains that support because of doing it this way. And these are all then fed and kept under the same hash as the index report. So we gain the same sort of content addressability benefits there. Next slide, please. And Updaters. Updaters just on whatever interval an operator has it configured for. Sort of check if there's a new version of the data source they care about. And if it is, such as it parses it, loads it into the database for you. And then any new matching requests will use that version of the data. We know internally whether a matcher has been run against the latest version that's in the database or previous version. And if it's not run against the latest one, it'll run again and save the results for you. So even if these Updaters are run on a staggered interval, you'll always have the latest version of the results and you won't redo the work that you've already done. So next slide onto notifications. And like I was saying, because we sort of keep track of the versions of the security databases or when we update them, we can internally figure out what's been added or removed and then work backwards to figure out which manifests we've seen that were affected by what's been added. So the Notifier effectively diffs these two versions of the database, discovers the vulnerabilities and then looks at the manifests and then does some sort of configurable action to issue notifications, whether that's issue a webhook or push it out to a message broker like AMQP or Stomp or something like that. Next slide, I believe is... Oh, sorry, I skipped over a whole bunch of topics without viewing slide changes. Here we go, moving from Clare v2 to Clare v4. So Clare v2 is the most widely used version out in the wild. That's what powers quay.io. So in the process of building the new version, we ended up actually rewriting just about everything. So the API is completely different. Again, it's manifest driven, not sort of layer stitching and all of our internal data is completely different. So you do have to resubmit images to Clare, but it should be faster. So a little initial pain and some win on the back end. And now I'm going to hand it off to Alash. Okay, hello everyone. My name is Alash and I will tell you something about how we use and release Clare at Red Hat. Next slide, please. So as you might know, and you already heard, Clare is an open source project for container security scanning. But it's also part of Red Hat Quay product. Red Hat Quay is a container registry for the enterprise that brings solution for storing and building container images. And since the security point of view is essential part of for the container workflow, Quay has integration with Clare. And it provides information about vulnerable content in both user interface and the API. So, if you either build or push container into Quay registry, Clare in the background, index the image and produce index and vulnerability report, as was described earlier. Those reports are visible in the user interface and the user can check whether their container is vulnerable and whether they need to rebuild the container to update the latest version. In the previous version of Quay, Clare v2 was used up until Quay v3.3, where Clare version 4 was available as a tech preview and it becomes default container security tool moving forward. So, in Clare version 4, version 4 is released as a GA in Quay 3.4 and Clare version 2 is deprecated. Clare is also available in a public instance of Quay, which is known as Quay.io and Clare version 4 will be used as a default scanner at the end of March 2021. Next slide, please. And Red Hat not only releases Clare and offers it to customers, but we also use it internally and we are integrating it in our container release pipeline. This means that every image that is produced by Red Hat or as we partner is scanned using Clare security scanning tool. And the release process is driven by the results. So, if we discover vulnerability in the report, we don't ship the container and instead we rebuild the container because we don't want to ship content that is vulnerable to customers. Container security information is also publicly available in Red Hat container ecosystem catalog and it's represented by so-called freshman's grade, which is simplified and graphical representation of container security score. It's a scale from A to F. Previously, in Red Hat we used the custom solution for container scanning, but since it was completely internal and we didn't have, our customers didn't have a way of to reproduce our grades. In 2019, we replaced the custom solution with unofficial version, Clare version 3. And last year, in October 2020, we upgraded to the current latest version of Clare, Clare version 4. Our instance of Clare is deployed in OpenShift cluster on the production scale as three separate services, one service for indexer, one for matcher and one for notifier. That allows us to easily scale up or scale down those services based on the load. In the past three years, Red Hat built over 1 million of images and currently in our production database, Clare version 4 indexed over 400,000 of them. Okay, now let's hand it over back to Lois. So let's go over what we expect from Clare V 401. Clare before hit GA a couple of weeks ago. So this is a roadmap for the next minor version. We'd like to expand on language support quite a bit. I believe there are some plans already for Java, which are going to make its way in via another collaboration with code ready containers. We'd like to address Golang and definitely get some JavaScript, the NodeMPM ecosystem in there. Hank is currently plugging away vigorously on a Kubernetes operator, which should just make the deployment of Clare and the various architectures that you can deploy Clare in pretty trivial to set up. We're going to put an emphasis on performance and scale analysis. So we really want to know, you know, how well Clare scales in the event of excess traffic, trendy and bursty traffic. So we'll be putting a lot of effort into that. And then just general reliability efforts, things like making sure database connections don't flood, you know, the downstream database and making sure we really push the right errors forward so clients can react to the various states of Clare. So for contributions, we're going to begin community development meetings at a bi-weekly cadence starting in March. I will have a link to those meetings in a subsequent email burst along with a mailing list blast and other ways that we go about disseminating that information. There's a quick link to our mailing list. I would go follow this. If you are interested in community development or further OpenShift briefing meetings, we'll definitely be posting there. We'll also be placing a lot of information in the repository, which is Quay Clare. The discussions area of that will be placing a lot of information about how to interact with the community moving forward. And then, yeah, there's just a couple of contacts here. And if you need to reach out to us, these are our emails and then there's a general Quay email. So that kind of wraps it up. I hope that was informative, built you in on the work we've been doing on Quay before. Well, thank you very much guys for joining today. I'm wondering, Hank, if you have, and if there's a repo for the Kubernetes operator, if there's an ETA for that, and if there's, if you need people to test that from the community? Not yet. I'm still, it's still in like half broken building phase. When you get ready for that, let me know. There's a lot of folks and that we can reach out to and get them to test that for you on different platforms and different, you know. Yeah, we will do our ultimate goal is to make it sort of seamless into the Clare operator. So if you solve the, yeah, the, yeah, sorry, Quay operator, the Quay operator, it pulls it in sort of transparently behind the scenes and get all the sort of scalability benefits of something that understands Clare's workload. But it just comes along for free with your container registry. And Louis, you were talking about performance and scale analysis and, and trying to get some better, a better grip around, you know, how, how Clare was doing out there in the wild. If there's folks out there who want to work with you on that or give you feedback on that, what's the best place that, is it that Clare mailing list to reach out to folks to do that? Yeah, definitely. I mean, that would, that would be really great. If it does peak anyone's interest, they can definitely contact us in a couple places, definitely posting anything on the mailing list. We'll get our attention to help issues or get help discussions. We'll get our attention pretty immediately or just shoot us an email and, you know, we'll take a look at that. But yeah, there's a couple options right now. All of them should be pretty viable. I know that the use of it in-house. Alice referenced the a million images scanned inside of Red Hat or we have created a million. I think we've created more than that. But personally, I think I might have made a whole bunch of messy images that should be scanned. But that's growing to other organizations that can can utilize performance and analysis metrics and give us feedback. Is there any telemetry being added into Clare or anything like that to collect that from outside external? Yeah, I think Clare is capable to send the metrics into Jagger. It's like building a feature. And of course, things like making logs available in the system that can easily make a dashboard out of it. For example, in our use case, we use Elkstack together with Kafka. So we are able to see some metrics directly in Kibana dashboard. Okay. And I'm going to read out one of the questions. I know you answered it in the chat, but I'm going to read it out for people who might be watching the video here later. Will the expanded language support for Java, et cetera, include sources outside of Red Hat, i.e. Maven Central, or will it be primarily focused on libraries that Red Hat supports? And Louis, you had a nice answer to that. Yeah, definitely. So Clare takes an open source approach to begin with. So all the data that we collect are independently upstream managed. And the reason we do that is because the most accurate data is the source of the data, right? So we go out to, you know, pulp manifests for Red Hat data. We'll go out to the Ubuntu security tracker for Ubuntu data. We are pretty adamant on not using quote, unquote, close sourced or aggregated data. We really do want to keep a consistency and a level of accuracy by going to the sources. So the long-winded way of answering your question, definitely we will go out to, you know, Maven Central or any of the official sources that we can find as long as we can grab the data in an open source way. Any kind of close source data sources, we have to open up to more of a discussion around how that could work. But we haven't had that happen yet. Yeah, I know for me it's always annoying to find a piece of software that looks like it does exactly what I want only to find out that there's some sort of hurdle in the way and you need to go get a key, plug it in and use it. That's not to say we don't want to be able to do that, but we want to make sure everything we ship by default works. All the data is there. All the checkboxes in our Readme you can go do without having to go sign up for another search. And there's one more question in the chat and this probably top of mind for a few folks. How would you define the relationship between Claire and Stack Rocks? Are they mutually beneficial or is one a superset or subset of the other? We just had the Cube-Linter folks on last week, I think, so it's probably topical. Yeah, so I can talk about what I know so far. I know that there's no official talks of how the teams will work together, but they utilize Claire V2. So we've already started discussing at a high level what Claire V4 can offer them. And I imagine that they'll be a pretty tight collaboration in the future as long as, you know, it's a little bit above my pay grade to determine those things. But as far as I can tell, they utilize Claire as a product and they would be interested in Claire V4, so we should see more of a merging than any kind of separation or a schism between the products. And I think that they would fall in line pretty well since the underlying technology they use for at least the scanning portion of their product is the same. So it works out pretty well. There's another relationship where we can get more performance and scalability from people using that. And so there's another whole chunk of the ecosystem, plus more engineering resources coming hopefully our way to work on this. And so that that's always, always, always a good thing. So, yeah, that's the most exciting part. That's always the most exciting part. One other earlier question that I just wanted to get a read out on to for anyone who's watching this and not in the chat was. Yeah, definitely. So we support remote matching, which is when in the process of looking at the data that the matcher service received from an index report, not to get too technical in terminology, but it can take a look at those contents. And then instead of looking at its own internal database, it can go out to an API. So anyone can go and write an API in any language you'd like. If you do want native matching support, obviously there's some advantages there. You will have to write it in go at this time, and it will need to reside in the player repository at this time. We are working on on a possibly allowing X out of tree matches so matches that live in other repositories. But yeah, if you are interested in in the cross platform, looking into our remote matching might be applicable to you. Right. Well, I don't see any other questions in the chat. You said you might have a little demo to give us if you're up for that. Well, and that would be great for those of you who haven't played with Claire. Let's try that for half a Iota of a second. There we go. Okay, cool. Yeah, so what I have here is. Pause pause for one more second. There's one more question. Will Claire V for be pushed to the container security operator and if so when. So, as I understand it, the container security operator is written against ways API. So it depends on having quay and the latest version of quay. Ships with Claire V for as it's sort of security back end. Thanks. All right. Let's get her get her done with the demo. So what I have here is a local Claire environment. You can do this today. If you want to, you can go and pull the Claire repository. And do this command make local dev up just make sure you have the necessary dependencies. We won't go into that right now. It involves Docker compose, but you can just look at the repository for that. And what it does is it sets up quay and Claire locally on your machine. Quay lives at local host 8080. So what we can do is we can go over here. I've already logged in. I created an admin account and I created this organization clear before org. When you're utilizing the local environment, you can use pod man as far as logging in and pushing an image to the local container or the local repository. You'll want to specify this TLS verified false because it's a local environment. We just didn't bother to wire in SSL. It just kind of gets in the way. So what we can do is a login. And that's at port 8080. So this will log into the quay instance that is running here. And I have a container. It's just you bun two latest and we can go ahead and we can push that. You'll notice that you'll have to tag images with the local host 8080. And then we can go ahead and push that to the local quay instance. Not at me about. Okay, just got barked at me for a bit, but it looks like the push is successful now. So we'll go over here repository called testing. If you go to this tag area, you'll see that we have successfully scan. Let me make this bigger. I'm sure it's quite small right now. This was the image we just pushed and then Claire was able to scan it. You can go and you can dig into the security information via this gooey. Yeah, so if you'd like, we can just give you a quick overview of what Claire actually did. Yeah, well, well, Lewis pulls this up. Like, again, one of the cool things is because everything is driven off of this content address ability here. Because Claire did the work of scanning this, this base image once any new container that shows up only the new layers need to get analyzed, which makes it a bit snappier. Yep, nice things pretty quick. So, if we just stand through the logs right here, you know, you have to be acquainted with the application a little bit to understand, but the indexer is really going and it is. It's doing an analysis on the layer, right? We didn't find an OS release file. Okay, so we couldn't identify it. So the match identified for the AWS distribution. Oh, okay. I can't find an OS release files. I'm just going to continue. So, yeah, these will be the indexer itself is doing this. It's grabbing the layers. It's identifying any items that can then it's storing. And then it's storing the index report and then the clear match. It's really just going ahead and it's it's issuing these these matching. So, as you can see, it's a Ubuntu image is found that is 92 are interested. This corresponds to in the talk, Hank mentioned that, you know, most of the matches aren't going to care, right? Because it's not like you're pushing a some kind of hybrid between an oracle and a photon container. So it's nice because even though we're kind of fanning out, we're not really doing a lot of work. Only one of the matches actually wind up doing work in the common case. So this is the the match are basically evaluating the pushed content. So this process will return a vulnerability report and then that vulnerability port is pretty much just parsed by quay and presented here. So, yeah, you can do all this right now. If you wanted to go and pull the clear repository and play around with it. Again, you will just basically go and run. Yeah, you would just go and run this and that will deploy a quay instance at local host 8080. If you have any troubles with that, you know, just, just we've been doing a new thing where if you have like quick quick support questions, drop them on discussions. Q&A on GitHub. And that seems to work pretty well. We can mark things as like, yes, this is the correct answer. But it's been a nice place to put anything really support. It should not serve. It's a bug, right? You should put something there. We'll triage it. We'll work with you to understand, you know, why your environment might not be working. But yeah, that's clear before. Hope you'd like to demo. Love the demo. How about if you can throw up the resources slide one last time so we end up that people know where to find you again. And I don't know if anyone else who's on the call, if there's anything we missed Daniel or Bill or anyone else that we should have covered. I think you did a great job here guys. And so I'm looking to see if there's any other questions coming in from chat, but I think we're, we're good. I think we're going to have to get a bunch of folks hopefully listening to this out there in the universe to come to the community development meetings starting soon. And we will post this video up there along with the slides if you share them with me and weed it out. And as always, great stuff from the Claire team. And we're really grateful for all the work that you guys do. So thanks for coming today and sharing all of this with us. All right, take care guys.