 Good morning or afternoon Um, is that capital from who from cloud custodian? Can you hear me? Hi? Yeah, I can hear you. Um, I just realized I'm not logged in as the the host So I just wanted to check to make sure that you could share your slides and present if you have slides I Do and I checked that and it seems to work. Okay, super Then I can just be Sarah Allen. I don't have to be sick security anonymously so I will put the notes in the Chat for people who are joining we have a tradition of everybody adding themselves for attendance and Then I'd also like to call out for the regular Members anybody who's sitting in front of a computer who's willing to scribe we just take notes live in a Google Docs so that people who can't attend Can get highlights and decide whether they're going to watch the videos and we generally post the videos one of our fabulous staff people at CNCF Post the videos shortly after the call so usually in the next day or so So I also want to let the regular group folks know that we are We've been small group talking about going back to what we were doing in 2018 and early 2019 where we had every other week presentations and every other week working group meetings, which were more discussions and check-ins and talking through different proposals and so this is Unusual because we've had two presentations in a row, but that's because Brendan Couldn't be here because cube Cube forum is happening in some place That he had to travel to so we're accommodating Brandon who's been active and had a discussion topic that he wanted to cover and because it's before the holidays I didn't want to just move this to January so So if there's time at the end, we'll talk a little bit about we'll cover some team logistics and discussion topics about You know things that are top of mind But if the if needed we will spend the whole time discussing cloud custodian and I'm excited to have a capital here to do a presentation and I talked to one of your colleagues on slack and Suggested that the presentation be 20 or 30 minutes and then We would have time for discussion. I want to allow that time for as much as People have questions and things that they want to discuss So thanks for posting the slides in the chat I will just wait a moment for people to add themselves for attendance Before we get started can do I have a volunteer scribe? Since it's mostly a presentation I can scribe if needed and then we have a place and then At the end the agenda that if you have announcements, please type them into the agenda or if you're Or feel free to put them in the chat if you don't aren't able to use Google Docs and and then if Then we don't necessarily have to take time for the announcements if we if the discussions is Taking longer so with that I Will ask capital to introduce yourself and Custodian and tell us a little bit about who you are and why you're here Cool. Thank you. So I'm here with my co-presenters John Mark Walker from capital one and Andy long from Microsoft I work at AWS so we've been interested and sort of having custody and be part of the CNTF as a incubating sandbox project and As part of sort of the new process around that we've been We were told we should come talk to sick security and sort of give a background project and That's what this presentation is about I'm first gonna hand over to To John Mark so that he can do a bit of an intro and context To the project history John, I look like you're muted. I was thank you for catching that because I started to speak and Started to check. Thanks everybody. It's a pleasure to be here. My name is John Mark Walker I run the open source program office at Capital One and you know Just to get do a little bit of the stage setting here. This is something that we've been talking about for I don't know At least a year now probably before my time before joining Capital One But I since I've joined here last since August This has moved top of mind for me and it's very important to me that we You know become and improve upon, you know previous Open-source efforts and this is one way to do that. And so one of the things we're looking to get out of this session is a series of Some sort of working relationship as well as a you know next step so we can follow to so we can make sure that we You know get things across the finish line at some point So I'll be looking for a direction from the rest of sig on that At the end of this but but we're we're finally, you know at the point where we can say yes We absolutely want to do this took a lot of time and effort to get here. So You know, it's a it's been it's been it's been a journey and we're looking forward to you continue that journey with the rest of y'all I Just want to in one introduced myself and just kind of lay the groundwork there and to say that You know the technology is relatively mature. It's been in development for over three years now I think it's something that would be appreciated by you know, the rest of the community I'm hoping that some of you on the call or at least somewhat familiar with it but with that without further ado, I'm gonna turn back over to Phil and I look forward to seeing what we can come up with Thank you, John So what is it? So kids like studying is a stateless rules engine that is intended to help sort of Customers and users manage their cloud public cloud accounts at scale So by way of background so when I was at Capital one and we were sort of first going to the cloud Sort of recognize that as you know dealing with you know regulations and security and compliance aspects around cloud was Resulting in we want to automate as much of that as possible The natural tendency for organizations they as they go into the start out their journey in the cloud tends to be around sort of writing one-off scripts around these different requirements and Sort of extrapolating forward from that started that journey into the future I was seeing a place where we would have hundreds of these random scripts and There would be questions about you know who deployed them. How were they well tested? What was the operations around them? And so, you know that was going to be a bit of a mess And so wanted to sort of take a step back from that and try to look at how to deal with that problem holistically and So custodian tries to be sort of that Swiss Army knife Around all the different concerns an organization may have around their cloud footprint be it managing Security be it managing cost optimization and to do that it integrates very deeply with whatever the cloud providers native tooling is So, you know Google Cloud Functions, Amazon Lambda You know AWS config whatever those native capabilities are it tries to sort of integrate with them fully And expose them to users through that DSL to become the easiest way to consume some some of these new provider features So at a heart at its heart. It's effectively a set of policies that are written in the ML file each policy targets a particular resource type it has vocabulary of hundreds of resources and and Nationally, you can then sort of execute that policy in different Different sort of execution environment. So the engine is is agnostic to where it's being executed So it could be executing in the container and Jenkins box and in a serverless function It's agnostic to sort of where it's execution environment But when we when a user does specify an execution environment You'll do the work of sort of actually provisioning all the event streams and the serverless functions kind of the scenes for them so the actual policy, you know on this case, we're looking at Amazon's elastic blocks storage volumes and we're going to go ahead and filter for any volumes that are Not attached to an instance and that don't have a particular tag retained So we're filtering the set of things that Set of resources for that this policy is targeting To find the things that we're looking for and then we'll go ahead and take in an action a set of actions on them So this could be in this case. We're actually marking it for an action in the future So we might garbage collect it in three days and set out a notification The notion around the one of the key notions around custodian is this ability to sort of Decompose our policies into very fine-grained Using creating policies with our vocabulary very fine-grained filters and actions So we might have an action like stop any situ instance that we use for off-hours or we might use it in response to a security event as a remediation activity Or our incident response in that case So continuing forward the other sort of key aspect around sort of getting Transparency around What these policies are doing is having sort of a very rich set of outputs. So the custodian integrates natively with all the different Clough provider Storages metric services distributed tracing log the integration so that users have Sort of an easy access to metrics dashboards You know resource dashboards around which policies are compliant or not compliant As well as the ability to sort of take the the raw logs From an optics to origin index them into say elastic search or something and then of course the other key aspect around Doing any sort of remediation activities inside of a cloud environment In this real-time fashion is being able to send notifications to users. So we will you know enable we enable sort of Setting out to slack into integrating into a downstream Splunk Setting out the email So you can also do so here's some example policies You can sort of chain policies together and sort of create richer workflows. So a Semantic workflow might split out into multiple sort of concrete policies in this case. We've got sort of a policy looking for easy two instances that are not appropriately tagged and then Having them being marked for it being stopped in a future date. So this is giving a chance In combination with say notification for a the end user who provisioned the resource to actually Remediate but remediate themselves. If not, then the policy will come back through so a lot of concern and governance here is around not really You know an organization may have hundreds of application teams and in this context what we're really looking to do is to give sort of a centralized team Either security operations You know compliance a an ability to sort of have a ground-based assurance that regardless of what tools and application teams you've been to provision be a terraform be a cloud formation be it you know research templates that That the that their cloud environment is sort of conformant to a known baseline that they're defining in these policies So what can you do at the studio? Lots of things so a lot of these sort of filters and actions because they're small and composable Are reusable to create as Lego bricks to create all kinds of things we have We'll get to some of our community later, but we have thousands of users using it for every conceivable thing That That they've been able to express and a lot of these things are things that you know We have the authors of some of the tools and containers contributors getting the original I think of So looking at sort of what does it look like to run and deploy this thing? So the it's a one-line install at the stateless engine So it's as far as getting started you can it's you know You can duck runs duck or run it you can Pip install the tool itself is written in Python. And then of course you have these rich Bookabular of execution modes that the tool will actually provision and hook up the event streams for the and then as it's the tools also abstract out to where it's getting this data, so it's the policy Execution modes are are going to define sort of where turning and the policies themselves are fairly isomorphic to what that location is And then the policy themselves can also pull their source or data from different places so in some cases a given execution mode will We'll just take whatever is in the event stream in some cases It'll go to sort of the describe API calls that are available from the cloud like the gets In some cases, they'll they'll use a cmdb resource database As far as where they're getting this information to start processing So I'm going to hand this over to Andy from Microsoft Hi, I'm Ken over here I'll just do a quick intro to my name is Andy long. I'm an engineering manager at Microsoft and Over the past year and a half. We've been contributing heavily into adding support for Azure in the clock is it in and One of the the big components was having compliances code and This is has been really effective for customers that we worked with and internally at Microsoft to where having your compliance, you know the policies are written and stored in YAML files and gives us the ability to version them actually Go through a good like a pipeline and process for Deploying them. We build a lot of tooling in cloud concerning around it as well Well, we have a tool called policy stream that allows us to look at It get repositories history and compared to the changes in the policies over time and all these have effectively made the Added a lot more rigor to compliance and actually how you do end up deploying these policies We've seen integrations what from the policy deployments with drone Jenkins as your DevOps and This is effectively been a really important component. I'm going forward in the space that we have here We can go to the next slide and on the Azure side It's similar to what Kapil was saying We we built the integration with Azure functions so that that could be one of the hosting environments for a custodian In addition that you know, you can always run custodian in a VM or a Container whatever you choose. That's the best fit for For your situation and so in particular of Azure we're using leveraging Azure functions as the serverless offering and We're actually able to subscribe to events that are happening inside Azure Your Azure subscription via event grid and we're triggering off of those events to perform some action This is an example a simple policy around key vault and here when a key vault Actually exhibits a right event What we're doing is we're filtering for a particular tag Which is a creator email and then the the action that is performed when there is no creator email on the tag here We should actually go ahead and tag that there's a really important scenario to just help with like ownership of resources as Developers and even in production people are deploying a lot of these resources being able to map it back to like, you know Who is the ultimate owner is really helpful? We can get to the next slide I Added a little flow here to just some help with the visualization so Azure subscriptions can emit activity logs that we subscribe through event grid another Azure service But we pull this into an Azure queue which actually is a mechanism so that we can deliver this really anywhere and One of the hosting options like alluded to was Azure functions that will go and listen to this is the queue DQ messages there. That's where Cassone actually executes the outputs are Potentially stored in Azure storage there. We have some other options and then a lot of the metrics and Executions exceptions anything else that Around the monitoring of the function actually is stored in application insights, which is a It would just come under the Azure monitor umbrella. So we have a really nice flow for Customers that are running Cassode and in production where you can have all All the logs integrate right actually into like the native of Azure solutions But also you have the flexibility to output it really right wherever you want Let me go to the next slide What one went back One thing to really point out where Kind of Cassodian fits into the the grander picture for Azure in our context was that we were looking for Something that will complement all the other Azure services and you can do you can essentially establish a very similar Diagram for all the other cloud providers that we're also associated with where a Cassodian kind of fits in where In conjunction with all these other governance tools that Azure actually supports natively, which is very important for Customers that we've worked with because they want to use the Azure native tools But then there's always a point where as their governance Implementation matures. They're looking for more customization and also when you look at Customers that have more multi-cloud Deployments having something that they that can actually unify that and also is consistent of all the clouds to help with their governance story So this this is just a really important aspect that we're not using clock students to replace anything But it's actually just complements all the stuff that already exists in the space Thank you I'll hand it back to Cabell Thanks unfortunately, most of our Contributors are actually out of Eastern Europe. So we're unable to attend but the logical notion is that these policies are Looked like this is a policy for GCP that will anytime you start an instance will effectively say If it's got a quarantine tag then go ahead and stop it Now it's important to keep in mind like that that workflow that sort of Andy showed and that And that is also present for for GCP sort of this flow diagram. All this is sort of It's not The user doesn't have to do this it has to provision all these things when you run to starting on command line and give this policy With that execution mode it will go ahead and do all the wiring and provisioning for for the user so that they can So that they're they don't really have a lot of DevOps Responsibilities as far as what they need to do Any all of us so in addition to the state with rules engine We have this sort of this provisioning around the policies and their execution environments and that's all sort of you know, Delta Diff's Stakehold based where An update to a policy will do an incremental update to the versioned infrastructure But the I think it's a and sort of switching out to some the thank you for the AWS like so this is ends up being a super powerful capability from a Governance perspective a lot of considering is geared towards sort of and from a real-time perspective is geared towards sort of these detective controls You know anything that users can express in via the I am language of their provider We would go ahead and recommend they do that first a lot of I am decisioning is maybe Not as flexible for some of the nuances that people want to express in policies But so this ability to sort of introspect the API Callstream that's happening at their provider infrastructure in real time to make sure that the things that are being created are compliant to policy ends up being a really powerful capability And so just again as another example sort of integrating with whatever the cloud providers native capabilities are One example here is with AWS config Is being able to take a given policy simply put out the mode to config role and have it deploy it as as a As a custom config role within that needed service From a multi-cloud perspective around the different providers, you know, we cover off on sort of the key feature set but API subscription Observed observation capability exists sort of natively across all the providers And then of course logging metrics multi-account support Exists as well So to study is also an umbrella project around Several different tools that sort of help users around automation or operations C7 and Oregon is our sort of parallel multi-account Multi-subscription multi-project execution where it will allow for a user to take a set of policies and execute in parallel across accounts regions, etc The other component to that is sort of our our notification system, which is just going to mailer Can be deployed as a serverless function or running within a container and it's going to sort of drain our It sort of subscribes to a data flow of actions from policies that are trying to do notifications and then can do formatting and delivery to multiple different downstream channels It's sort of just giving a little bit more flavor for around sort of the operations cost and and security aspects This is a policy that you know in the same way that any policy was tagging on keyboard creators on this policy subscribe to anytime someone creates an S3 bucket to go ahead and Add the whoever created it as a owner tag to that resource From a cost savings perspective, you know This is mostly about sort of taking that Taking the ability to look at a resource look at its metrics stream try to find things that are sort of underutilized for their for their size and send out notifications or some cases do resizing Around them and of course you can also do off hours From an IM perspective, you know the policy on the left it ends up being is a Configural that will use the IM simulator against the the instance roles associated to it to find any Any instances that have the ability how the IM to actually create another IM user and so this will end up being sort of flagged as overprivileged From on inside of the config dashboard On the on the right side where there's a separate policy here that will look for Access keys that are haven't been used in 120 days and then go ahead and post a finding into another native service AWS security hub will be Sort of correlated as an additional finding and triage from there So talking about community we've got many different channels. We've got our main homepage and we're currently on github From a chat perspective. We have about we we've been using getter about a thousand users in in chat and From a private staff's perspective, we've got about 230 contributors You know lots of unit tests about 25,000 downloads a month well over a million total downloads Looks looking at 2019. We've had about a hundred contributors merging about, you know 750 pull requests and this is sort of breaking it out by which particular Provider or feature that they were working on and so it's it's pretty evenly split across Across the core and and the different providers We've been starting to pick up on some of our Kubernetes stuff, but I can talk through some of that on our web app As far as just sort of breaking it out by companies This is sort of the the current breakdown as long as the top contributors and who are also All those top contributors are also maintainers as well From a principal's perspective, you know to stadium tries to focus on being operationally simple a lot of users, you know You know eight you get tends to be using a lot of enterprise contacts, but it also tends to be used Sort of in smaller shops as sort of this one off-cost Optimization so we want to be fairly simple to run and operate And so we try to integrate with the native services as much as possible partly to alleviate Any operational burden from an end user We want to sort of keep the the core fairly simple and minimized from a Vocabulary that we're introducing the users like we have, you know, you know, probably a thousand different filters and actions in the categories of resources And so just trying to make those fairly rotagonal so And simple for users to get understanding with all this stuff is You know the schema validation and the dry run capability that we run in CI, you know, we use this on schema All those filters and actions and capabilities are are automatically documented Out from the code into our doc site as a reference documentation So as far as our roadmap on the tool itself is written in Python We're looking at sort of how we navigate the The end of life of Python 2.7 given a lot of enterprise users and distributions Are using enterprise distributions, which tend to be a little bit later on on to picking up that deputation as well as sort of Introducing a bit of lazy loading just to as we expand out the number of features we have we want to make sure that we're Being being simple around cold starts in CLI Execution as far as not needing to load anything more than we need to Some structured logging. This is sort of our internal to-do list Not that I'm going to gloss over it. So But feel free to ask questions As far as Kubernetes integration, we have a Kubernetes provider. It's been it's pretty minimal at the moment it sort of allows for sort of whole base querying on Kubernetes resources and sort of expressing policies on most of the Kubernetes built-ins across the application namespace as well as CRDs We'd like to go ahead and sort of ramp that up as far as building out our initial control of facilities And to date because it has also been fairly unopinionated about how people deploy, you know, we've seen Jenkins and Bargait and Kubernetes and GitLab CI and so I think Kubernetes offers us a good baseline to actually start having an opinionated deployment As far as what is good operations and good deployment of custodian management framework look like on top Kubernetes so Any questions Hi, this is Ash. Great presentation you guys of Quick questions. How does this project compare with the open policy agent? Or are do you see any? complimenting features distinctions or Yeah, just wanted to get your thoughts on that Sure so Great question. Open policy agent is is has I think I think there's some Complementary features like I think open policy agents does a really great job of enabling sort of edge-based decision Around like and it's very decoupled from data sets the The what the mission for us is that we wanted to have that short the real-time behavior around the cloud environments for us Was sort of where we started off and opa started off around sort of Kubernetes As far as it's some of the tighter integrations and it's been around in this ecosystem for a long time For us this is sort of growing out Our accessibility to cover sort of the full space around infrastructure. If we look at where sort of opens Gone as far as it's decisioning a lot of it's going out to sort of edge like you know SSH integration or and so it's a very decoupled engine And so that's been really nice one of the things that sort of separates the two is that opa wants to sort of keep a full Full inventory and memory as far as being able to do that decisioning casuding itself doesn't really is typically going to pull Pull on-demand any additional information it needs from third-party data sources. So a Lot of the filters that we're using are like are going to do additional API calls into a cloud environment To sort of verify things. So for example Validating that a launch config is valid requires us verifying the security groups the image etc around it whereas opa is typically trying to do fairly localized decisioning around Whatever it has in memory at that moment Cool. So so do you see a cloud custodian not being running at the edge as like opa is it's more of the Validation that that's what you think that's what you see Yeah, we're typically operating on the control plan directly. We're not necessarily doing edge decisioning To the extent that the control point like so we'll do like control plan integration with whatever the control plan event stream is To do validation and enforcement. Of course, we can run be run incrementally across the partial subset Be run and dry run run on, you know a periodic basis like it doesn't have to hook up the event stream But we're always operating against whatever the control stream the control planes API and events streams are Whereas with opa, I think you can you can do some of that I think as far as what the built-in integrations are today like the only place that I think a valid statement is really around sort of Kubernetes Specific integration of stars gatekeeper, but if you look at sort of in the wild where it goes. It's typically It's typically With a smaller footprint out towards the edge. I think there's some I Have some national sort of research experiments. I'd like to do now that opa has sort of been Have some support for compiling rego to wasm that it might be interesting to explore additional integrations But that's sort of speculative at this point Thank you. So ask Do you have um, do you concur with that? Are there things about where do you see the overlaps and differences Or I mean, it's okay. If you don't know any more than what was just said, but I wanted to toss that question back to you Sure, so Yeah, yeah, so like like I think like some of the points couple mentioned makes sense Oh, but like you can run it on the edge and it's more focused on performance and low latency use cases like authorization Like for cloud custodian, I would say since it's pulling everything down during eval time. That's not a very favorable use case So that I would see that one distinction between the way opa and cloud custodian gets used And yeah, opa was started off More towards Kubernetes, but it's a general purpose policy engine So I could see a way of integrating it with some of these use cases where you're putting cloud custodian with AWS or GCP So, yeah, I there's overlap with those kind of use cases like admission control But performance wise like opa running on the edges. I would believe it's much more performant Rather than pulling because it doesn't it does everything locally. That's that's yeah That would be my initial thoughts about this Thanks, Ash. I have a bunch of questions, but I wanted to let the group ask questions first Anybody else have questions? All right. I'll dive in so I noticed your GCP Support is in beta and Oh Way early. We had an issue where we Had wanted to where our issue meaning we file issues when we want to invite people to talk about things And there was an idea that we would invite cloud custodian and for SETI because at that time if I recall correctly cloud custodian was for AWS and for SETI was just for GCP and now I was excited to see that one of the platforms had Embraced cross-cloud support and I was wondering whether you reached out to for SETI and You know, if it seems like that being a Google project They could potentially contribute a lot of the Google stuff and maybe we don't you know, like if it would make any sense to Invite them to participate or maybe you've had this Absolute So there's you know, there's been number of discussions over just getting blood time with all the cloud providers And there there have been some contributions from Google Not directly from the force eddy team you know There's a sitting in Delta like I looked at force at a have way before I started working on GCP support a few years ago I think you know, especially moving into CMCF, I think that there's there's a strong opportunity there for getting some contributions from the force eddy If I was going to compare in contrast sort of what they do today Force eddy is typically doing sort of a running pull of all the resources or integrating with cloud asset inventory running a set of rules and dropping it into a database as a run from a Architecture perspective around what what our goals are stars offering sort of that real-time response. It wasn't something that force eddy was really Doing when we looked at it, and I don't think it's out of that in the interim sense, but potentially it's on the roadmap so what we sort of targeted initially was this capability around sort of doing that real-time integration, so hooking up, you know Cloud audit logs to pub subtopics to Google Cloud Functions and enabling those policies to be deployed out there To be able to do real-time response on the order of a few seconds To to respond to events as opposed to getting a database on a polling basis Every few hours that that has some set of things that need to be remediated We've generally found that from remediation perspective if we can remediate it things and send notification to the user Immediately that they have a much better experience like they're not setting up an instance and getting it all configured and deploying their application And then a few hours later having it sort of removed Outside of say their deployment or their operation their change order window Whereas with this they sort of get that immediate feedback that have an email in their inbox in a minute the resources immediately, you know remediated and It tends to be a much nicer flow from an end user perspective But to answer the question. Yes, we would love to have for additional contribution from the force editing they've done a lot of work around sort of a set of you know, a Static set of rules around what is sort of best practice in their environment? And so that's something that we would love to collaborate with and potentially be able to express in custodian policies going forward right and that kind of leads me to like thank you for the compare contrast of the architecture because I had looked at it a couple of years ago, but that was a while back and One of the things that that reminds me of is that it's been a while since I did application development but I remember In the early days of lambda They they were a lot it wasn't I Haven't heard that Amazon has SLAs around if an event happens your lambda always Gets executed and in the early days it was actually pretty not complete And so there's always a trade-off between how real-time you can be and how Accurate you can be and so what are your thoughts around? Compliance particularly where you know, is it okay to miss an event because You there are other processes that are not cloud custodian that you expect to have happening or does cloud custodian also provide some kind of assurance that Even if an event didn't get fired. We're still gonna check compliance on something. Yeah, so the Custodian supports sort of both sort of looking at the whole fleet And evaluating the whatever everything is the state of everything at the moment as well as sort of integrating with these Event-based things generally speaking the event-based things have or like it seems to be fairly fairly solid Like 99.99, but yes there from a percentage possibility of something going wrong in the cloud Then at scale of course that that is a certainty Having the ability to do sort of periodic pull evaluate of the whole fleet is also a baseline capability It's really just pushing out the execution mode from sort of being event-based either on you know cloud what cloud trail or Azure functions to being to removing that and the default there is just to pull the system resources So to clarify is that a capability now or is that something you're thinking of looking at in the future? So the built-in capability now. Okay, so people can decide how a hundred percent They want to be based on the particular compliance. They're doing Like maybe if they're turning off an instance during off hours, it's not that important It's 90, you know 0.001 percent of the time it doesn't happen But if it's well, maybe my bucket is wide open to the internet. That's not okay Right and and some of the execution as we integrate with are doing the extra behind-the-scenes work as far as you know Like say where you're deployed as a config role in AWS behind the scenes That's you know doing the event stream as well as doing polling and then feeding that information back to us So from a policy perspective that wouldn't need to be sort of duplicated as two policies One is a you know full-fleet evaluation and one is event evaluation But generally speaking, you know, how how considering users typically are authoring is they'll take in a right of policy But learning into a whole fleet and then they'll start adding in the event base the event basis that they wanted to execute on so Even in development that they're sort of you know switching between modes or seamlessly as they are trying things out Great So maybe that's the question from chat Which is that Steve Hatfield says in chat If there are any roadmap items for reporting overall health in contrast to only viewing non-compliant resources great question So health is interesting not sure the context of that is sort of like Generally general cloud infrastructure the ability to filter and sort of geometric queries allows you to sort of operate on many different Expressible parameters. So, you know, we we brought the example around sort of cost optimization You know looking for oversized use resources using metrics But you can also find sort of flagging resources. There's a lot of operations work that Qasodian tries to automate like In the context of AWS you might have, you know, auto-skilling groups weekly reference their resources So, you know, we've seen you know operational environments where the auto-skilling group is just continually trying to spin up with instances and it can't because it's misconfigured and so Qasodian support sort of detecting like actually going through and validating that as well as sort of subscribing to additional event streams around You know launch failures around instances Additionally in AWS and AWS is where it started off initially about four years ago. And so it definitely has the richest integration There's also support for your personal. I think it's called personal health dashboard, which is a horrible name But it's sort of the underlying data center cloud infrastructure events around a service so you can actually subscribe to say, you know, any time an EBS volume is lost or any time You know, there's an outage incident status page update That you go ahead and notify so the application team that those resources are affecting And so there's rich capability around doing operational work with Qasodian It's there those tenders capability tend to divide differently across different clouds and what those clouds Exposed natively that I can think so you can use it as an event stream I don't know Steven and still around to see if that answered the question. It sounds like he had to drop off It looks like we have another question. Yeah So that's a great question So policy mistakes so to certainly has built-in safety belts Like obviously any anytime, you know, doing remediation at scale or we're doing sort of mass Operations around infrastructure at scale. It's it's it's a good best practice to have Some sort of some safety belt capability and some of us derived some the full notion of compliance is code like in in most teams will set up CI infrastructure, so they're doing I've got Jenkins that's there that it's you know on a Pull request is going to go ahead and do a scheme of validation of the file You can use some schema and I'll go ahead and do a dry run of the policies that are there now a dry run is not going to do any Remediation any act that will not take any actions on resources. It's simply going to get show you the set of resources that you filtered to And then those can post back a sort of a to the pull request or as a comment with regards to Sort of what this what these policies are going to affect and Andy had touched upon So our that tool we have called policy stream and policy stream is you know, we talk access code is all about sort of workflows But one of the things that policy stream does for us is it also makes this Custodian policies compliance this code also something that's machine readable so that you can actually look at the The string of changes through a get history from a machine readable perspective of saying, you know These policies got added these policies got removed this policy got updated and actually so Getting a sort of a diff of just what policy what the policy changes from a set of Commits and then executing a dry run say just on that small smaller delta From a going back to the service safety belt suspenders Custodian has the ability to say if we're ever going to touch more than 5% of the fleet Stop or if we're ever going to touch more than two resources then stop and so that's sort of a built-in capability as far as safety belt With regards to policy execution and in affecting a larger population then then you might expect of course from an exception perspective, you know one of the One of the common rules I've noticed is that every rule has its own exception is the ability to sort of Pull in exception lists around to give our policies and source them from from URLs from S3 from from JSON feeds and CSVs Be it around like this particular Image or instances exempt from these particular policies and using sort of the intrinsic Using external integrations as far as defining what those with those what that set is Thanks, look like looks like Steven had a follow-up Yeah, so you can go ahead and so as far as sort of doing a whole fleet evaluation you can very much do that you can evaluate the whole fleet determine You know this particular these particular instances don't match this particular set of criteria and then Look at that as a whole and then you can actually do an aggregate query an aggregate filter It says if this is and percent of the population or if there's more than 50 of these things then proceed to actions So the event so the event stream is really tied to execution mode But the default execution mode is effectively evaluating the whole fleet and there are filters that are set up to doing sort of group group analysis Around the set of resources that are there as they keep as additional filters So we've just got 10 minutes left So I wanted to make sure that to allocate some time to enter your question Which is what's next? Yes Yes, and So I would like to draw everybody's attention to at the I added to the agenda the I'll actually do this backwards a little bit There's a Low there's a this is it's come up a lot that it's not clear to many people what exactly happens in To see on board like to see decision-making and how does a project get on boarded how are decisions made in the CNCF and It's complicated by the fact that this has changed over time Yes, but I Have been in a bunch of these meetings and been around for a little while and been the recipient of I don't know what's going on and how do we become a CNCF SIG? That lasted many months. So I And I've been a fan of this this markdown flowchart maker. So I took a Discussion from last summer that was in a you know diagram tool and I turned it into and I you know filtered in a bunch of other discussions that I've heard In various meetings and made this flowchart. So this is not adopted But this is my attempt to write down what I think everybody is talking about and nobody has said This is terribly wrong and so I thought I would just kind of tell everybody about this I don't think we have time to have a deep discussion about it, but but everyone in the group and Folks from cloud could study and you should feel free in fact obliged to look at this carefully and tell me I'm wrong and say where this doesn't make any sense and we can elaborate because I think there is a idea that The TOC would like this see it the SIGs to participate in Making it so that the TOC has more bandwidth by kind of pre-flighting all of the It does is this project a fit without making a decision but a recommendation, right? So that if projects are really clearly not a fit they get quick feedback if there's a bunch of discussions It can happen in parallel across the different SIGs I think that There's a there's some nuance to like can things get stuck in the SIG? What would happen if the project disagrees with what the SIG said blah blah blah, right? There's a lot of detail that would need to be worked out, but the truth is that the spirit of this is that There's a long queue of projects that would like to present to the TOC and then they present to the TOC but that Then there's like a Q&A discussion due diligence thing that there just isn't enough bandwidth on the TOC to do and So this is an effort to parallelize it. So this is my Understanding of what I think we're doing So we're gonna go forth we can go forth down this path If you don't like this path then like then we can change it, but by default we'll go with my understanding of how things are going So which is that like basically you were Sent to us to be to say hey engage with the SIG You're interested in becoming part of the CNCF So the different ways that you can engage is of course, you know, we were excited to have you present here we'd also encourage you to like, you know participate and then we have a new members page and That can help speed things up because you're helping us get through our backlog Also, we I'd encourage you to go look and see are there other projects where? Cloud custodian could be useful. I mean you're already connected to Kubernetes, of course But maybe there are some other projects that it would make sense for Them to use cloud custodian or you to use them or something something So that's I think just sort of a generally good idea because those questions may come up if there is an obvious Connection that may not be obvious to you if you're new to the CNCF or even obvious to all of us. So There there's a due diligence process Which right now is multiple due diligence processes Which we Have on deck to sort out so like in the next weeks if you hadn't arrived we would be working on clarifying that process so I just want to let you know and Then as part of that what we've talked about informally is that It would make sense to us to me and a couple people I've talked to that as part of that process We have this self-assessment That is a document that is generally produced by the project as a way to Explain what the project is and what its security posture is And so One idea is that you if you are enthusiastic and you think that this would be helpful to you and to us You could go you could produce a self-assessment which would speed things up And then the other option is you could say wow that looks like a ton of work I'm not sure we're up for that and Wait until we decide whether this is required or if there's a lighter weight process Yeah, that's a few questions there I'm just trying to understand what the what the process is because I've seen products go through six security assessments products that are already in and TMTF so it seems like it was independent from sort of the inbound activity towards the talk and So I'm just trying to understand is that being defined as a prerequisite and having looked at that process It looks mostly like it's going through the CIA CIA Badge out that stuff is there additional stuff beyond that that is something that you're looking for So I think that well to answer your question. We've done two assessments in Tota was done as a prerequisite to a TOC recommendation and Then There was some confusion about whether that was a good idea and maybe we shouldn't require people to do an assessment Yet we wanted to do assessments of the people who are the products that were already in CNCF so we invited OPA To do our second assessment, which is a very collaborative process eight ashes here who really helped You know steer that process so that we could refine the process and so our goal is to Well, we will definitely do assessments of all the security related projects that are part of the CNCF and then we are figuring out what our bandwidth is to assess projects that are of Course have security needs, but are not security projects and we you know, we're kind of trying to figure out whether We have enough bandwidth to just do all of them or whether we're going to prioritize them in some way Right now a lot of it is prioritized about projects that come to us or we do outreach for and so so it's it's We're exploring whether we will have a lighter weight do whether the security assessment will be You know either recommended or required or Well, just think about doing this sometime You know or any anywhere in between and so that hasn't been that's not decisional And you've sort of caught us in the midst of having this process and we basically have said that until we've done five security assessments and we've Evaluated our process of doing so and we know We can say if somebody comes to us or the TOC says do an assessment We can say yep, it'll be done in n weeks where we're shooting for n equal three right now Till we can assert that it will definitely take this amount of time. We're not going to make it a requirement So so you're in this interim stage. Does that help? I'm still unclear on kind of where to go from here and I don't want to speak for a little there, but It sounds like as you're going through this process and trying to establish what exactly it is it sounds like this project could be used as kind of a I guess a test case to finalize or Firm up what the process is exactly or you could say whoa Wait, we'll wait until your process is done. So I went to answer your question about the self-assessment the CI best practices is Hardly the most important or biggest part of it Although like it's a basic checklist of like if you're not doing 90% of the basic stuff like we're a little worried about you and It's more it's less that you need to do it because there certainly could be projects accepted into the sandbox that are Experimental and they're like we're excited about them and we're just like oh just FYI this is a set of things you ought to be doing right and Just queue it up before incubation or whatever, but it's more that like that's That's the like the least of your concerns And my guess is that that would not be arduous for any project that is security focused or at the maturity of parts custodian the important part most of this self-assessment is just like Some common format for what does your project do? The big part of it is the security analysis What's your threat model? This is something that is Rarely like really surfaced in a concise way for open-source projects. And so this is the meat of it, which is that You explain to us what you think your Security posture is and and what are the threat potential threats by adding your thing into the mix, right because adding a Generally when you're adding a project which is supposed to increase your security That that's huge huge benefit But of course you have to evaluate that that's another attack surface And so we want to just streamline that and you know have an opportunity to discuss that and think it through So that's kind of like the big part of it And so one of the things that I think might work well And I'd really like your feedback on which we can do asynchronously because I know it's the top of the hour Which is if you were to like look at this and say well how much work would it be for you to produce that? Would that be a reasonable requirement for you to produce that and then the actual self-assessment may or may not be necessary at this I think we could try to get that kicked out relatively quickly. I'm just looking at the the holidays impending and People sort of manning out on vacation and family time that sort of what that puts us from your Three-week target timeframe to well the holidays and it's going to take you know two months Anything before the holidays in any case right so but the due diligence like so this is what we're exploring is that maybe we have a lighter weight due diligence Which is just like do we have any big concerns that would be blocking? Are we where are we on the realm of? No, we recommend the TOC not touch this thing or Yay, please. We love them right like there's a spectrum and I'm Completely exaggerating how we would present this we generally present things as these are the benefits and you know challenges of this project and so But then but that what the the TOC is really looking for us to prioritize Different projects for them to look at along with some well-ordered data about the project Okay, I think we could take a stab at that and work with Whoever is interested to help their self-assessment and some of the threat models most of the threat models here really around Controlling access to the get repo that has the policies like doing executing with and then Sort of high trying to hijack one of the functions that was deployed But yeah, we can I can walk through on that the and then as far as the project description and The CII Review that's relatively straightforward. So if there's someone who want we created a clock to study in Slack channel for around the street assessment Already, so anybody's interested feel free to join in that slack channel and we can start coordinating on content. I believe that mostly set up as pull requests to you the Sid 3po Yeah, that that sectional seems like a good place to coordinate for the activity and capill I can tell you you know, we're highly motivated to Help push this through so we can add resources to this as necessary to streamline the process. So Awesome. Yeah, for sure Yeah, and sorry the last two assessments who started with a Google doc and then had people comment on it But it could certainly start with a pull request. Whatever is your preference for the starter doc? Okay Maybe just the discoverability of Google Docs isn't always great unless they're going to improve somewhere So maybe we just have an issue with that Okay, if you don't have a we can do a Google doc If you don't have a security don't have an issue yet There's a template for a security assessment. So you would just kick that off And there's like a bunch of things to fill in and one of them is the self-assessment Sounds good. All right, right. Yeah, so I think it's it's five past So I'll close the meeting but folks feel free to chime in on slack And if there's things that we didn't cover that we need to coordinate and then next week will be our more usual working group meeting I think there are some topics that people are thinking of raising But then we'll have check-ins and discussion of what's top of mind folks Great Right. Thank you, sir Everybody and thanks very much folks from cloud custodians is a great presentation and discussion Awesome. Thank you