 All right, we'll go ahead and get started here. Hello everyone, my name is Terry Bowling. I am on the REL product management team for Red Hat Enterprise Linux with Red Hat. And my area of focus is REL automation and management. How can we make REL easier for you to manage and configure? And of course, all of the things that we do also translates to Sintos and Fedora. And with me is Pavel. Would you like to introduce yourself? I'm Pavel Tsahina. I am the developer of the system roles project. And I'm a senior software engineer at core services at the REL business unit at Red Hat. So today, we're going to talk to you and educate you about Linux system roles. And so the overview of this talk will be the overview of the system roles. We're going to give you an introduction to two examples, the network and storage roles. Because those are our two roles that we're getting a lot of feedback that those are some of the most commonly requested or needed types of automation when managing REL. And then Pavel is going to show you some examples, some demos of how to use it. And then we'll talk a little bit about some of the common role challenges. So the reason we're here, I spent about 14 years as a sysadmin for a few different companies. So I've kind of been there with you, managing, configuring operating systems and taking care of things when they go bad. And back in the day, I wrote a lot of really bad shell and Perl scripts. I was very proud of them at the time, but I pity anybody who had to take them over after I left and they got the job done. They needed, they helped me do what I needed to do. But I can completely relate to these examples where we need to do something. We write some clever script and in Perl I loved Reg X and I did a lot of horrible things, but it was fun. Some of them were wicked cool. I didn't always remember a year later why I wrote some of it, but it worked at the time. But as things change, stuff broke. And we see that all the time with REL, the transition from REL four to five to six to seven and all of the rapid changes that happen in Fedora, things change and your scripts need to be maintained. And of course, the cycle is rinsed and repeated. So one of the questions we're asking ourselves is how can we make this easier? And as things change, how can we Red Hat and the Fedora ecosystem, how can we take on some of that technical debt so that we maintain this so that you don't have to? And so what we have, our goal is to provide a collection of Ansible rules that function as a consistent configuration interface to REL and Fedora. And when I say REL, you can insert Sintos there, but usually I'm usually saying REL and Fedora most of the time so I'm not ignoring Sintos. And how can we ensure that they're compatible and tested on a regular basis with Fedora, REL seven and eight and future versions? And some of them are also compatible where appropriate with REL six today. And ensure that they're constantly compatible and tested as things change. And so we're abstracting configuration from implementation. And what I mean by that is as tools and utilities change, networking is a great example. Your application needs an IP address and maybe a bonded pair or multiple aggregated network interfaces. Lots of times our CIS admins don't necessarily care about bonding versus teaming or init scripts network configuration versus network D versus other things in the ecosystem like system D network D or maybe some future crazy, wild idea that somebody dreams up that turns out to be awesome. How can all the CIS admin cares is this team is asking me to stand up this database and it needs bonded networks. All I care about is the IP address and multiple NICs. That's all I care about. I don't wanna care about all the utilities, all the low level technologies. I'm overwhelmed. I've got too much to do. There's too much to learn. I can't absorb it all. So we put that technical debt, the community. We put that technical debt on users. So we're abstracting this. So we're giving you a simple way to describe, I got an IP address. I want a bonded pair. Just make it so. I don't care about the details. Just make it work so that I can go to lunch with my friends. And so that's what we're doing. And it's also critical that this is maintained by Fedora and RELL engineers. We couldn't just go to Ansible and say, hey, do this for us because they're not experts at the networking stack. They don't know what's going on in networking upstream and community. They're experts at designing an automation framework. So we need our RELL and Fedora subsystem engineers need to be the experts of that subsystem. And knowing when things are being developed and how. And so we're working with a number of different teams to develop this and put in automated CI testing in Fedora. So that as these change and evolved and new functionality gets added, we can maintain that automated testing against RELL 7 and RELL 8 and future versions. So what we have today in Galaxy and on GitHub, we have this Linux System Roles project. From that, we publish that in Galaxy. So if you're an existing Ansible user, you can simply type Ansible Galaxy install linuxystemrolls.network or .firewall or whatever, whichever one you want. We also package them in Fedora as a Linux System Roles RPM package. And in RELL, they are packaged as RELL System Roles RPM package. And the reason for that is we have use cases where we need to make these accessible like from the DVD ISO image and things like that because people might want to do automation where they don't necessarily have network access. So not everyone is quite accessible to the internet and able to use directly from Ansible Galaxy. So we're looking at a variety of other things like Automation Hub that the Ansible team is working on and different things like that, providing multiple ways to access it. But the versions that we package is what we can say this is the tested version that we're packaging and supporting for enterprise use, but the same team is working on the same stuff and the Fedora CI testing is doing a lot of the testing. So the stuff in Galaxy is really good too. But the stuff in Galaxy, you're gonna be seeing change at a faster rate until we package it and ship it, for example, into RELL. So multiple ways of accessing it. And so what we provide today is network, storage, SELinux, TimeSync, PostFix, and KDump. With RELL 8.1, we started providing a workload role for basic SAP HANA configuration as well as free IPA identity management is also available in Galaxy. And I think, where's Thomas? I think we're gonna be shipping that in RELL soon. Yes? Oh, it is an 8.1. Awesome, thank you. I didn't realize we were actually shipping it. So my apologies. Yeah, so free IPA, if you're not familiar with that, it's a really awesome identity management system. I think they have a whole another presentation. Thomas, you're gonna be presenting on automating IDM, right? So look for that session. He'll be using these as examples. And some other things that we're actively working on, scoping out requirements for our firewall. We already have a proof of concept firewall role in our project in Galaxy, so you can try it out and test it. We're making some revisions to it before we declare it stable, or before we start shipping it as an RPM package. And then additional application workload roles for Microsoft SQL server and things like that. The logging role will help you stand up or standardize logging profiles. So if you need high performance logging or lossy logging use cases and setting up a centralized RSS log server or setting up rel to log to an elastic search, it'll help you take care of all of those different use case scenarios. So lots of stuff, kernel bootloader. We got a lot of stuff we're excited about that we're working on. So you'll see those gradually appear in Galaxy and Fedora. And then I think we have two or three lined up for the rel 8.3 release. So we're pretty excited about the new stuff we're working on. So that's the general overview. And I think now I'm going to hand it over to Pavel and he'll take over for the demo. Thank you for the introduction. And as Therese said, the main feature, the control. Sorry. The main feature of Linux system roles compared to maybe some random roles that you can find is supporting multiple releases and hiding the changes away from you. So how exactly are we doing this? So one class of changes is just simple, like package renames, renames of configuration files, renames of services, those are easy enough to handle. But then we have more substantial changes like change of the complete implementation of a given functionality may be replaced by something else. And those implementations of a given functionality we call providers. And so our roles support multiple providers when appropriate. Some examples, canonical example is our time sink role. Note that it's not called crony role because it abstracts away the details of crony and it supports both NTPD and crony. So we have two providers for the time sink role and they are configured in the same way. So you put the same variables and you just specify the provider and it writes out the crony or NTPD configuration as appropriate. Another classic example is network role when we can configure networking via the old init scripts or network manager. Logging, we have just one provider, RC slog, but it's prepared to accept other providers but I'm not claiming that we will ever support it but the possibility is there. And the providers, as I said, implement the same interface or at least a common subset of this interface because of course not all the providers will support all the possible features. So now I will introduce some of the roles. I will not introduce all of them but I will select a particular subset. First one is time sink, which is a medium complexity role, let's say. And the interesting thing about time sink is that it supports those two providers. So it accepts a time sink NTP provider variable which can be set to NTPD or crony. And then the rest of the configuration is same. And what happens if you don't specify this variable then it chooses an appropriate default for the given system. If no time sink service is running, if there is a time sink service running it respects your choice. And so for other versions of REL6 it would be, the default would be NTPD but for newer versions of REL6, REL7, NTPD8, NTPD8, the default would be crony. So this is the example playbook for complete application of the time sink role. Then networking, this is one of the most complex and I would say the most useful roles but I will show just one slide about it because there's another trial by Till on Sunday and I also talked about it. We talked about it the last year. So as I said, the network role is the example playbook. The network role also supports providers, network manager, and init scripts. The default for REL6 is init scripts for newer releases. It's a network manager. And what is important is the network role actually doesn't manage direct interfaces by connection profiles, which are a concept in network manager and init scripts have the appropriate scripts which are mapped to connections. It accepts a list of connections and the feature it can set are many. One of them is runtime state up or down. Persistent state that means connection is present or absent on the system. It supports Ethernet devices. It supports IP protocol configuration both for REL6, automatic via the HTTP or static like here. It supports bonding and teaming. Bonding is more official, teaming is let's say interview. It supports VLANs. Here's actually an example of bond. We configure bond into one member interface of the bond. It supports VLANs. It supports bridges, infinite band, make VLAN. So, an example, as I said, list of connections. This is a bond with a defined IP address and the member of the bond. We can say what interface name should the connection be bound to if this is not equal to the connection name. And we applied the network with those variables. So now the storage role, this is quite new, so I'm going to speak about it longer. The principle is to simplify local storage configuration. This means providing concise model to describe the storage layout. It means the model of variables accepted by the role. Also provide reasonable defaults. Now in Fedora, you have a default file system of XFS and the default storage layout is LVM. But this may change. Maybe in the future it will be Stratis. So we don't require the users to specify what they want and we apply the default from the system they are running. But of course, Stratis and LVM would have to be managed by the same interface to have this consistent experience. So this then provides a requirement for this concise model because it has to be abstract enough to cover multiple implementations. Also, this is related to the endless non-essentials details automatically like creating a partition or not. It chooses, if you don't specify it chooses a default which currently is to not create a partition but create LVM directly on the disk to simplify the layout, but this may change. And it reuses the existing storage management logic. What does it mean in practice? It uses the Blivet library which is also used in the Anaconda installers of the layout and the details should be the same as provided as created by the installer. Unfortunately, this is one consequence. The Blivet library is not available at the required version in L6 so we can manage only L7 and L8 and of course, recent versions of Fedora. So now some examples. Simple example, we have an example playbook which creates a file system directly on the disk without volume management. So we have volumes variable which says that we want to mount it as slash backup. The disk should be SDC. And the file system tab is commented out because it chooses a suitable default which is XFS. Now what about, this was just a whole disk. Now what about volume management? So for this, we are getting back to the consistent configuration. So we choose an abstract model. We have two layers of configuration. We have pools and inside the pools we nest volumes. And for LVM, pool is just a volume group and volume is logical volume but for other volume managers, the terminology may be different but we will abstract it in the same way. So LVM is for pools, LVM is default and actually the only supported one right now. So we create a volume group on those two disks. And inside the volume group, we create the volumes. We create two volumes with a given size. Again, the file system type doesn't need to be specified if you don't want. And we mount them and we actually also configure them in FSTAP so they are mounted on boot. So this is all handled by the role. We can provide, if needed, we can provide the file system type explicitly so if you want, you can make sure it is X4 if there are different changes. You can provide X4, whatever is supported on the system. You can set some file system label, file system create options or mount options. Those will be, of course, file system specific. So when you do this, you must make sure that you understand the appropriate options for a given file system. So with the status now, we have a stable version released already in the Galaxy version 1.1, I believe. It's included recently in the 8.1. What is supported is what I showed, whole disk and so with single partition, basic LVM support. And what will we support in no particular order is encryption and the rate LVM interventioning because those are just the classical LVM logical volumes. The multi-path, also LVM rate in addition to MD rate and video compression and deduplication and possibly a host of other features like other volume types like strut is, if there's demand for this. So now about some challenges in the storage role. So I will first speak about the most important one because even if the storage role supports what you need and supports the system that you need and has a nice and logical and abstract configuration layout of variables, if it happens to destroy your data, you will still probably not be very happy with the result. So the most important challenge is not destroying your data. This was actually quite a challenge because the role doesn't remove volumes which are not specified in the variables. But what about conflicting volumes? If you have a volume already on the system and you create, you specify that you want a volume with the same name but a different for a system type and we cannot convert from XFS to X4 or vice versa. So the role would have to delete the volume and recreate it. Or let's say you, by mistake, you give it a disk or there's an existing partition or existing file system and you want to create LVM on it or vice versa. You don't want the disk to be wiped in this case. Of course, detection is not 100% reliable in those cases but at least when we can detect it, we don't remove it because we have a variable stretch safe mode which defaults to yes and which tells the role to not do such possibly unintended and unsafe operations. So this prevents removing and creating existing objects. But it doesn't protect again against intended removal because if you say that you want such a volume to be absent, it will remove it because in this case, it's presumably not a mistake. It's what you asked for. Another challenge is for those are like challenges more for the future development. So those are stuff that people would probably like to have but they are kind of hard to implement. One is automatic device name where one wants to create a file system on LVM. One needs to supply mount point and also the logical volume name. So we want that is to consider not having to supply the logical volume name and having it deduced automatically via some, let's say default from the mount point name. Also automatic size if you want to use the full size of the disks it shouldn't be necessary to specify the size. Another is automatic disk selection is unused disks. You shouldn't be forced to specify the disk by the names which can be even changing but it should be possible to just say that you want all the disks or the three disks and the role would create the layout on all the disks. And also size specified as pretend percentage of the total space. Why they are changing because it's easy enough for new deployment but then it's question what to do if the system changed like if we choose to do LVM on all the disks and then you add a new disk what should the subsequent application of the role do when there's already this volume group. So we are thinking that the key is to preserve the current configuration when it's already there and not adding new disks. Also for percentage-based size of course we cannot support it as percentage of the free space because if we say 100% of the free space, next time there will be no free space and the next time would use zero or any other percentage it would be probably indeed. This means that we have to specify the percentage of the percentage of the total. So now I will do a demo. The objective of the demo is to configure VM so that it has one more configured network interface. I will set a large empty on this interface. I will create a VM. I will access a disk over this VLAN. I will create a LVM on this disk and mount geological volumes. I will export them over NFS and I will enable firewall and the NFS must be working when enabling the firewall. But I will use that and I will actually use a reader because the network is too slow here. So let me show you the playbook. So can you read that? Does he need to make the font larger? Okay. Bigger, well. I'm afraid this is the biggest that extends up first. We do have all of this in a GitHub repo so that you can access it later. The link is at the end of the slide deck. So I'm gonna show you the assumptions for this example because I created an Iskazi target listening on the host. And I will not show how to do this because I don't have time. I will really show only the configuration of the virtual machine. So it uses some variables which tell to which Iskazi server to connect and what directory is to export. And first of all, I create the network configuration. So I set up the Ethernet interface with the large MTU. I could bind it to the interface, the profile to the interface via the name but I don't need to because the name is the same. I could also bind it via the MAC address. I configure an IP address on it and I configure a VLAN on this Ethernet interface. So I specify the parent which refers to this one. VLAN ID and also an IP address on the VLAN. Then the Iskazi initiated configuration. For this, we don't have a role yet. So I do it, let's say, manually using standard Ansible modules. I start the Iskazi DEMON. I make it connect to the Iskazi target. And this provides the name of the disks that it will create. Finally, I use the storage role to create the volume group on those disks. So the storage pool, the volume group is called export-vg and it has two volumes shared with a given size, with a given mount point. And users, I change the file system type here and also with a given mount point. I then make the directories word-readable so we can access them over NFS without worrying about authentication. And I export them via OASIS roles. This is our sister project. We don't have a Linux system role for NFS exporting so I'm using this one. This can be also obtained from Ansible Galaxy. And I'm sharing those two directories to the VM host. And finally, I'm using the network role, sorry, the firewall, also part of Linux system roles to enable the NFS service and to enable firewall D. We are applying the whole playbook. I'm doing the recording because with the network here, I found out it's too slow. It hangs at installing packages and it has to install packages multiple times. So the networking at this time has been already applied and it tested the connectivity. It does it every time because you may have changed your management interface. Now we're setting up the iSCSI initiator. And now we are applying the storage role. This is complex role, so it has lots of tasks so it will print lots of messages. So now it finished mounting those file systems and now we are exporting the NFS directories of NFS and finally, we allow NFS on the firewall and we exported those NFS shares. So here I'm on the virtual machine host and I'm mounting from the VM this one directory, the export share directory. And I showed that one can see the file, the test file that I created in this directory. Now I will show how it looks inside the virtual machine. So those are the network interfaces. You can see the VLAN here, ETH1 with the NTU. Also the VLAN with the appropriate IP address. I will show the configuration of the devices, of the disk devices here. You see the disk with the two logical volumes mounted as those two mount points. And finally, I will show that this is really the iSCSI disk because this is a utility which shows the iSCSI properties and you see it's indeed iSCSI. And to demo, I will skip this one. We don't have time anyway. And also the author of the login log run for church couldn't come to death conf. So how many of you consider using learning system roles now, not everybody? And how many of you consider are already writing unstable roles or consider writing unstable roles? I see definitely some people who wouldn't consider Linux system roles. So at the end, I will provide some notes about implementing unstable roles which are not specific to Linux system roles. So even those who implement their own roles maybe can learn something. So the challenges mainly revolve around respecting the previous setting, the previous state of the system. And I'm now speaking about actually interface challenges because we want to provide a stable interface. This means that we shouldn't change it because if we provide a way to hide the changes in the underlying system, but we change the role we create the same problem again. So we have to design the interface in a stable way. And so those are interface design changes. For implementation, we can always change it if we respect the previous interface. So about respecting previous setting, this is actually interface question. Whether we want to declare complete state or only to declare changes to the system. Let's show some example using the ACLinux role which I have not shown but it's simple enough. It can set standard booleans. The question is whether it should, when you list the standard booleans, whether it should only set those booleans and to drop all the previous modifications to a set a system to a clean state or whether it should just add this one boolean and keep all the previous ones. So let's suppose we would choose to drop all the previous modifications and only set this one. So we have an example of a Samba playbook which sets one boolean for Samba. But now let's suppose we have also an FS playbook which sets some boolean for NFS. And now we want to combine the two playbooks or render them for my master playbook or apply the two to the same host. What happens? Here we just clubbed the boolean here. So for this reason, at least by default, we in those cases always preserve the previous state and only apply the changes to it. Hey Pavel, we're almost out of time. Could we ask the audience if they have any questions and get their feedback? I think so. So yeah, does anybody have any questions? And then if we have time, we can go through a little bit more. Yes? Oh, of course. Because each role has a... Yeah, he was asking, is there a place where all of the variables and input parameters are documented? Do you want to take it? So each role has a readme file which you can see on GitHub or it is reproduced on Samba Galaxy and it's also included in the package. Also, it's formatted as HTML so you can see what the role does and what variables it supports. In the storage roles, you'll mention some feature about automatically picking drives and so on. And isn't that dangerous if you want to run a role multiple times? I mean, they are supposed to be idempotent, right? So like if I run them a second time, they shouldn't change anything. But it seems like they would always grab the next two available disks and so on. Yeah, that's a good question and that's why I elicited among the challenges, let's say the challenges or challenging features. And I think we think that the right answer is only to be able to say that you want all three disks and then on the next run there are no three disks but of course the volume group is already there so you don't need to do anything. But that said, if the volume group was already there, even if we picked only one of them, the second time as the volume group is already there, it will not modify it. That's what I mentioned. We want to make sure that the second time it runs, it doesn't do this automatic detection because it already has something and it's satisfied. So in this case it would see that like the mountain point that is associated so that drive already exists and then it would leave it alone. Yes, yes. Okay. Yeah, do you want to answer that? So the question was when you use the role statement in the playbook, it exposes all the variables to the whole playbook run and how safe it is, is it? And whether we should not use the include role. But the answer is that that's why the role variables are prefixed always with the role name to at least reduce the possibility of accidental collisions. But I'm afraid that the include role statement also keeps the variables, it doesn't? It's isolated. But if you use import role, it's like roles. Import now, include. Okay. There's a difference. Yeah, so it is definitely then safer to use include role, but we are working under the assumption that one would use the role statement and so they should be safe even for this. Another question? Any other questions? Yes, sir. So you just showed us Time Sync supports NTPD and Chrony 2. Do you plan to support system D, Time Sync D also? I don't think so. We do not yet support system D, Time Sync D, do we? Well, we don't support it definitely right now, but I think it depends on the demand whether we will support it. But right now we don't have any specific plans to do it. So if that's a feature request you would like to ask or request you can go to our GitHub repo and open an issue and any feedback that you have to share, anything that you would like out of these, we'd love for you to open an issue on GitHub or a bugzilla at bugzilla.redhat.com and let us know what you need out of them. If they don't meet your needs, we're very, very interested in your feedback. So let us know what features you want us to add. Yes, sir. Yeah, we have a few basic examples. So the slides will be made available as a PDF and at the end is a link to our GitHub repo where there is a devconf demo folder and it includes a couple of different demos right now. There's one that uses them as an example for setting up an image builder node and then Povls will be uploaded later today, I think. It's already there. Oh, it's already there. So we have a few examples. We're also looking at, for those workload examples, we want to have like an overall end-to-end solution playbook that demonstrates using these to set up, for example, Microsoft SQL Server and taking care of like all of the different things you need to take care of. So we only have a few basic examples right now but over the next year we expect to have multiple. Also, you don't need to use all the rows at the start. You can start one by one because right now they are pretty independent. Yes, sir. Yeah, the question is, can I specify the block device by UUID or by, yeah, by like the PCI bus path? Do you want to answer that? Some of the methods are supported but I'm not entirely sure what is supported and what is not supported. But at least I think the UUID links are supported but it needs to be the device UUID button and not the UUID of the far system. Because there are, yeah, this one, okay. Yeah, actually, so not today but that's one of our goals. So for example, as a database administrator, the storage team gives me a new 100 gigabyte LUN or a terabyte LUN and all they give me is the WWID or the worldwide name or whatever you want to call it. And so how can we provide you multiple ways, both for networking and storage, to have multiple ways of expressing the device that you want to use. So for storage that might be the Linux blocks device name, might be the multi-path name or it might be the WWID that the storage array presents and then let the role figure out the multi-path names and everything so that you as an admin don't have to worry about it. You just plug in the WWID, hit go and it just takes care of that for you. So that's one of our goals and aspirations is to require the user to only provide us what details they must provide or want to express but not require you to specify all the typical default values. Let's assume recommended defaults so that you don't have to express all the details unless you want to. You have the power to express more details but otherwise just give me the WWID and I'll just get it mounted for you and take care of the file system for you with the defaults. Does that make sense? Anyway, I'm sure some of the links under slash devs, by something I supported but I'm not exactly sure right now which ones are supported right now so at least some of them are definitely supported. I just wanted to say something if we're ready to wrap up. Oh, just a cool thing. Okay.