 Hello everybody. Good to see you all here. The topic of today's Tech Talk is an initiative, the last mini initiative called FASTA Discourse. The FASTA Discourse Operator, well, we created it and it serves to synchronize the group membership from FAS powered by free IPA, the Federal Discussion Discourse Instance. How did it start? In the beginning, there was a ticket from Matthew where he got inspired by the solution of Community Shift Authorization Operator to sync IPA groups with a Community Shift cluster. And he was thinking about using the same technical solution to sync IPA groups with the discourse groups. It would allow us to use the FASTA as a single source of truth and other things like to link permissions for posting in certain areas to group membership or grant Fedora contributors in certain groups, some higher level, not a new V level. So what does it do? Users in Fedora Discussion are added or removed according to their state in FASTA that applies to the groups that exist both in Fedora Discussion and FASTA and that's determined by matching name. The change made by the sponsors or members to the group in FAS will be automatically mirrored to Fedora Discussion. Not matching groups and some service groups like discourse admins and moderators are ignored. So it's a one-way sync of FASTA group membership to Fedora Discussion. Physically, if a user is added or removed from a group in FAS and that user has a discourse account, the discourse user will be added or removed from that group. Also, if an account is not a valid account in FAS anymore for whichever reason the user will be removed from all groups shared in FAS, with FAS. So the solution is very similar to what we used for the community shift authorization operator. We didn't want to reinvent anything but we wanted to use as much as we could from what has already been written. For this reason, we used the same Ansible-based operator SDK, the Playbook running task and we used as much of the code we could from the Python modules. The operator images in Quay IO under the Fedora namespace and the molecule tests are currently designed to run only in a cluster. Running locally would require creating your own discourse API key and keytap file for the Kinect and creating a secret. So that's why. And how did we solve it? So the Playbook is running five tasks. In the first task, the operator retrieves secrets such as discourse API key with hostname and fastjson hostname with principle from the private Ansible repo and populates the variables in the Playbook. All configuration for the operator is in the private Ansible repo. There are also variables which the operator uses internally like the keytap path, principle, ignore groups and so on. Those are populated by Query and secrets object in the open shift. Second task, the fast discourse operator handles the Kerberos authentication to fastjson via keytap file. The next task, the operator queries the discourse API to retrieve the list of the groups and list of the users of each group. In the fourth task, the operator queries the fastjson with the discourse group list and retrieves the membership of each group in free IPA. And in the last fifth task, using set functions, the operator figures out first who is not in discourse but is in API group and adds them. And who is in discourse and not in API group removes them, IPA. Always switch them. Anyways, important note is that it's a one-way sync. So the users added or removed by the sponsors in discourse will be kicked out or added again with the next loop. So it's important to point out to tell everybody who might be an admin of the discourse so that they use really fast. And for the matching to happen correctly, we had to rename some of the groups in discourse to match exactly the fast ones. Also, only the users that have an account in discourse get synced. Not all the users that are in fast. So the loop runs every 20 minutes now. This can be lowered to five or two if we see the need. But we must be careful not to hit the rate limit in discourse. The rate limit can be adjusted in every discourse instance by default. By default, the maximum is 50 requests per 10 seconds per IP. If the task fails, any of them, the entire loop stops and retries. So we eliminate situations like they've discovered during testing that when something fails, for example, during authentication with the key tab and we get an empty list of groups, the operator does not go and wipe out all the users from the matching discourse group. If a new group is added to the discourse and there is a matching group in fast, it will start synchronizing the users. So the operator has been tested in staging discourse instance with staging fast. And it has been successfully deployed a couple of weeks ago to production. And it's running in the OpenShift cluster. Here are the links. The code is in the buggered slash CPE in faster discourse. And in including some open issues with enhancements, we could be still working on. There is a link for SOPs where you can find talks about how to install, build and release the operator workflows for the debugging and so on. And yep, there are ideas how we could still enhance stuff. For example, by creating that the link permissions for posts in certain areas will be bonded to group membership. And that the users that are sponsors in IPA will be added as admins to discourse, which also poses, yeah. But there's a thing that in fast, as Matthew pointed out that in fast, the sponsors don't need to be members of a group while in discourse they have to. So that's a thing we have to solve. And we could also sync other account information like time zone region or pronounce from faster discourse or disabled discourse account when the account is not valid in fast in case there is a need for that. And yeah, one of the things we might look closer into in the future would be adding the users into their groups immediately when they log in for the first time in the discourse if there is the need to do that. Okay, that is all. We have also one stretched call that would be making the Ansible stuff into Galaxy as an Ansible discourse model, which we figured out does not exist yet. And yeah, with that, that's all. And open floor for the questions. We've answered a couple in the chat already. Yeah. Thanks. Go ahead, Kevin. Just real quick. Could you maybe talk about why this is an operator as opposed to a project, what the reasoning was there? How do you mean a project? Well, just an open shift. Yeah, so what is the advantage of an operator over just an application or another? And I know we've talked about this before, but I think it would be interesting. So it is a project and it is just an application. And the only thing is an operator is just an application that's built using a design pattern. So a design pattern, which is the operator pattern, and it's built using the operator SDK, which is like a framework. So basically, when we first started this project, like myself and Lincoln ran a couple of commands on the command line. And what it does is this thing just automatically creates the framework like the skeleton framework of the application. So it creates a whole bunch of files. And then if you look at what we actually added to the project, we only added Ansible. So basically what we got for free was all of the code that handles the crap, like being able to talk to OpenShift, communicate with OpenShift, retrieve secrets or interact with the OpenShift API. All of that kind of heavy listening stuff is already taken care of and handled. And the only thing you have to worry about then is actually just working in some Ansible. And anything that the Ansible doesn't cover, then you can, like we did do a couple of Python modules, Ansible Python modules, just to interact with the discourse API or whatever. So that's kind of why we went with that. You get a hell of a lot for free just by using this framework. So the whole loop, like the 20-minute loop that basically is very easily configurable. You can change it to whatever you like. But we just want to have 20 minutes initially at least. So from that point of view, like it's very similar to the Fedora, which I thought was once called toddlers, the whole toddler system. So you have something that like you have a short-lived job that you can run on a loop or, you know, you can activate whenever you need to. Does that answer Kevin or? Yeah, yeah. This is a great new question. Well, that might be a pattern that you want to use moving forward to for other things. So that, yeah, thanks. Yeah, what I love about it is the fact that it's mostly Ansible. So Ansible and Python. So they're probably the biggest skill sets we have on the team, right? So like anybody, once they see how these things are put together, they're very, very easy to look at. So like if you just go into the role folder inside the operator code, you can see everything is self-contained in that role. And it's just a couple of Ansible playbooks with the library files, the Python code bundled with it. And of course, if you have any dependencies or whatever, you build it in when you're actually building the operator container itself. And it just, it's built from the operator SDK image itself. So you have full control over this thing. You can do whatever you like with it. Add any dependencies or, but obviously it'd be better if you didn't have to add anything custom. You could just depend on the, you know, the upstream container. Yep. So I asked myself, like, like you mentioned that there are API limits, which we don't want to run against in this course. Is there a way in the operator pattern to act not on a like time-scheduled basis, but for an event-triggered basis? Yeah. So basically the whole point of the operator is that we've actually added a new API to OpenShift. So by installing this operator, we've created this. Yeah, what's it? Let me put it in chat. Fast discourse config. We created this new object called a fast discourse config. It completely depends on how you've developed this thing, but you could very easily add something that could then react on react, you know, in a reactive nature. If there was something like, if there was like a listener hooked up to the Fedora messaging that would then create or edit one of these objects in OpenShift, the operator then can take that and then run an Ansible task or an Ansible playbook to then do something based on that action. But in our case, it doesn't do anything apart from like a single instance of that object will actually just make the operator looper on every 20 minutes. It just reconciles everything, but there's nothing to stop us adding something. I don't know, like a fast discourse user object or something like that, which might, it might go update a particular individual user when one of those objects are created and then when it's done, then it might delete it. Thanks. Okay, so I have a question. What will happen if the membership of the user is changed on the discourse side? Then it will get overwritten. Just overwritten, okay. The person, if you add it on discourse, it will, in the latest in 20 minutes, it will get wiped out and if it gets deleted, it will get re-edited. Okay. Yep, so one way sync. Yep. I was just surprised that if there is any group on the discourse side and that isn't in ignore list, it will actually crash the loop. I kind of, I probably miss spoke, like I was thinking about that. It won't crush the loop, but what will just happen is it'll just be ignored because when you go to the code that goes to IPA and says give me all of the group membership. That's in this particular group. It'll just return like an MP list. And that's what we want now because so do we, I mean, if somebody creates a group in discourse, which is not in fast, so be it. It will only not get synced with anything. Yeah, that's great. That is how should it work. So yeah, it's okay. Okay. I'll see that James is asking. Sure. So I wondered, you have like backups and you can retry errors and things like that. Do you just log those? Or do you put them in any kind of monitoring thing? Just logging at the moment we have. We don't expose some default metrics, but at the moment there's nothing scraped because we don't really have a good solution for it. So we don't have, we could get them logged by the OpenShift user monitoring stack. I mean, it's already installed and ready to go. It's really trivial to get them hooked up and being captured by the OpenShift monitoring or say the user, the user workload monitoring stack, but from there basically nobody's actually viewing it. So as far as I'm aware, unless it creates an alert or something, there would be no alerting on top of it, for example. So it would just be metrics that just go into the ether. We do have some alerting for other stuff, other OpenShift apps for like crashed pods and things like that. And it triggers an alert and it does send an email, but I'm not sure if this could be hooked into that. It could be, yeah. I need to look at that. I'm actually also working, at the moment, I'm working on getting ZappX installed. So I have ZappX installed on a RL9 box and staging and I'm currently, just today, well, I managed to get the database working and I have the ZappX server hooked into it. So I'm trying to figure out now how to expose the GUI so we can actually play around with it. And there's already a whole bunch of work done on getting the ZappX agents reporting back to an instance. So I'll have to see, can I check if that's all working? But once that's done then, like I'd like to get the OpenShift stacks reporting back into the ZappX server. And then at least we'll have like one place where Infra will be able to look and see everything, you know, in place. Yeah, at the moment there's a whole bunch of pieces that are, there's a whole bunch of jigsaw pieces on the table that are not actually talking to each other. So it's a work in progress. There was a question by Kevin and Aurelian before. Why were you using the Fedora messaging to get the groups? I know that we had a discussion about it, but I forgot what, like, I remember the decision, but I don't remember the reasoning of the decision. So the reason, like initially I just wanted to get a system that wasn't heavily dependent or too heavily integrated with some of these other systems, just to get it over the line and delivered. So I would prefer to keep the initial system just running on this loop. I'd prefer to, like the thing that Neil's asked, or the question that Neil's asked, I'd prefer to extend it, like have a kind of reactive API. So basically extend the operator to be able to handle reactive requests. So we're going to have something like a listener hooked up to Fedora messaging. I'd like that listener then to create something like a fast discourse user object. And then the operator will take that object, find out what's the name of the user that needs to be actioned, whatever the action is, and then go do it at the end and delete that object and clean it up. I'd prefer to do it that way rather than delaying the release, the whole thing. The answer is at least. We can also reduce the 20 minutes, like from testing, I think we can reduce it down as far as one minute, but somewhere between one and five minutes. But at the moment, we just have it running on 20. So it can change, it can be changed later if it's like if the user experience is not good. But so far no complaints, right? I didn't register any. Nobody knows where to direct the complaints. Okay, and there are no more questions. I'll stop the recording.