 Hi. Today I'm going to present my story about my journey upstream with T-LOG testing. This is kind of a case study in getting started with open source software in upstream environments. A quick introduction. My name is Scott Pore. I am a quality engineer at Red Hat. I work primarily on testing identity management software in RHEL. If you're familiar with free IPA, that is the upstream project that the downstream identity management product is based on. Now I'm going to go a little bit into my journey upstream. I'll start by covering some terminology that may not be familiar to everyone. Then I'll describe where I started from and motivation for moving upstream. I'll cover some challenges you can face when you first start working with open source and my specific fears and concerns. I'll cover a little bit about how to contribute to open source. Go into the process that I went through for planning the test for the software. Then writing the test and the mistake that I made. I'll talk about submitting my test upstream. What that was like and what I did. And to end the section, I'll give a brief overview of lessons learned. First up terminology. What is open source? For starters, the term open source originated with open source software. This is software that makes the source code available to the public. To quote an open source dot com article, it is software that anyone can inspect, modify and enhance. Today the term is used much more broadly. It is applied to products such as hardware projects, which are designed and built using open source principles. The term is also applied to the general processes used in open source projects. These processes may be used to design and guide how a global run for example. As long as the process follows the open source principles in allowing anyone to view, adapt and share the process. It might be considered open source. The open source way describes the set of principles that outline or define what open source means today. To oversimplify things basically anything that is worked on in public and shared openly might be considered open source. I just mentioned working openly in public. This is a primary tenant of open source. It allows better collaboration when you can work transparently in the open. Open source software is usually free to use, modify and share within terms of the project's license. The license is used to dictate extents and limits of these activities. Most open source licenses have few limitations, but there are many different licenses to choose from and those vary sometimes greatly. Some allow you to charge money for your software as long as you still provide the source code with it. Most, if not all, open source licenses do require that you provide access to the source code. I'm including a couple of good links to resources on open source software licenses. One is another link from opensource.com and the other is a Wikipedia article comparing licenses. The latter includes a table comparing features and different aspects of different licenses. Now what exactly do I mean when I say upstream or downstream? In the context of working with open source software, the term upstream usually refers to the original project or work. Downstream would typically indicate someone using the upstream package with slight modifications or using limited versions, features or patch sets that they package to provide their specific version of the software. For example, the main Linux kernel source repository would be considered upstream, while a specific Linux distributions version that bundle only specific patches with their kernel packages would generally be considered downstream. Wikipedia's definition seems quite simple and direct. Upstream is toward the original software developers and downstream is toward a different group that uses and develops it as well. Upstream is generally hosted in public repositories. There are ways for anyone to help contribute to it, so getting outside help can be easier. Tests are often maintained right alongside the source code in the same repository, so everything can at times be easy to find. Downstream is often hosted in private repositories. Even if a company provides the software source code for free per the original software license, they may not and often don't allow public access to their source code management systems. The nature of a limited access project like this can make getting help more difficult. A number of resources available to help you may be fewer than in an upstream community, and they may have their own responsibilities and have limited time to help. Tests downstream are often maintained in a separate repository entirely from the source code. This can make finding them even more difficult if someone doesn't point you directly to them. Also, processes downstream may differ greatly from upstream. Different types of git repositories, different workflows, or even completely different people and release schedules, all of which can complicate things for people trying to use write and test software. So I'll give you a bit of a flow from upstream to downstream. This is a good example. The Linux kernel source repository in this example is the top of our upstream. Then Fedora takes that and takes specific packages, makes settings they need that they want for the kernel that they're going to provide, and that becomes downstream. Fedora's kernel is downstream to the kernel source repository, which is upstream. Then Role comes along, Red Hat comes along, and takes a specific version of Fedora to use as its base to build a major release of Red Hat Enterprise Linux. This then is the downstream to Fedora. So in the flow, Fedora is both upstream and downstream. It is upstream to Red Hat Enterprise Linux and is downstream from the Linux kernel. Now I'll talk about my starting point. I had little to no experience contributing upstream. I'd worked with open source for a long time, and I'd even submitted a few pull requests to a couple GitHub projects over the years, but all were very small. And in at least one case was a patch file I sent directly to a developer who applied it for me. Working on identity management in Role gave me some insight into the software development model, but the tools and processes I use day to day were still very different at the time. Almost all of the tests I worked with, wrote or ran, were downstream only. So I just didn't get the exposure to the upstream test. A few of my teammates did work previously on free IPA upstream tests, but my involvement was limited before I got involved with T-Log. So what's my motivation? I already knew I wanted to work more upstream in general. I wanted to get involved more and I wanted to learn the upstream development workflow. T-Log was already developed upstream and was a nice small but powerful project to start with. And as I was the assigned quality engineer, I got to choose where we stored the tests initially. So what is T-Log? T-Log is a terminal IO recording and playback tool. It can be run from command line or it can be configured to record all user shell activity. It can record to specific files or to a system de-journal or to syslog. It can playback from any of these sources as well. It consists of a few command line tools and a handful of libraries. So what needed to be tested? Well, the command line tools needed to be tested. I needed to know that commands worked as expected, that recording playback functioned as designed, and that a user shell activity was recorded on log end when configured. So where did I want to store the tests? Upstream, of course. Challenges. I'm here today because getting involved in open source can be intimidating. It can feel like trying to get into a room with a closed door that you don't quite know how to open. Do you need a key? Do you need a passphrase? Do you need a special knock? Or do you just turn the knob? If you're slightly paranoid and have an overactive imagination like myself, you can wonder if someone attached a heating element to the other side. It can sometimes be hard to even articulate how difficult it can be to get started. And even with my years of experience in IT and quality engineering, I still had some anxiety about it. So what are the barriers to entry? Sometimes getting started, as I've said before, can just be intimidating. It's not just an issue for people new to using open source or new to programming. You may have to learn new programming languages, new tool sets, new ways to work with people. For example, I know Python. In a former life, I knew Perlin C, but I've never used Golang or Java. So if I wanted to work on a project with code or tests written in either, I would need to start by learning those. Some projects could include tools you've never used before. I've used Expect from TCL, but that may be new or completely unheard of to a lot of people. A lot of projects use Git, GitHub, GitLab, or some other source code management and collaboration system. Or just some system you've never used before, or maybe a system you used before, but they're using it in a completely different way. Maybe they're using different tags, maybe a different group of people are required to handle reviews. It could be anything. Processes may differ greatly between projects. Some may accept patches via email, some may not. Some may require an issue tracker ticket be open before accepting a patch, some may not. Some projects also may not have good or any documentation that would help you get started. This can be extremely frustrating for both users and potential contributors. Another challenge could be the skill level required, or rather expected, by the project team. You may be interested in working on something, but the way it's programmed may just be beyond your current skill level. And maybe that's what interests you. Maybe you're looking for that challenge to enhance your skills and learn more. That's what brings and keeps a lot of us here, I think, but it could be a challenge that you have to work through, possibly on your own. Another challenge, real or perceived, is that some communities are harder to get into. Some seem more tight-knit and may appear less welcoming than others, for whatever reason. In some cases, it can appear as if they only want top-tier developers. Even if that's not the case, they may not have time to train someone new. So, my specific fears. As I mentioned earlier, working upstream can have challenges and barriers. The following were my fears. Whether real or perceived, these are things that concern me about working upstream. Things that made me anxious or hesitant to move forward with my plan. I wasn't familiar with the GitHub workflow specifics. I had a very general understanding, but I was going to have to learn new tools and new ways to use some tools in order to be able to contribute upstream. I was worried that I'd performed some stuff wrong in the workflow and screw up a pull request and have to drop it. Or not know how to perform some step that Dev asked me to do that may seem trivial to them. One of the original developers of the tool. I'd worked with him a little and, in fact, I had interviewed him years before and knew that in the past he had done some kernel programming, which I've always held in very high regard. So I was a little intimidated. My main fear I had was that my code wouldn't be good enough. That Dev would reject a submitted pull request and I'd have to rewrite everything. I was worried that my techniques for testing would be considered inefficient or just wrong. Or maybe that I wasn't adhering properly to coding guidelines. These types of criticisms are all things that make open source software better in the end. But it can still be intimidating the first time you go through the process. So how can you contribute? There are a few common ways that should apply to most projects. If you're familiar with the dev doc test process that covers the first three ways that I mentioned here. First off, you can contribute by doing development. Writing code used by the actual software itself. This is probably the most common form of contribution. It's the main pull for a lot of people to open source software in the first place. The desire to write software. The next option is to help write or maintain documentation. Most projects have some form of documentation, hopefully, but could also use a lot of help with extending it, correcting it, or just updating it to match current changes to the software. Some of the projects want help from someone with a technical writing background, and some just want help in general. Better documentation can help a project look and feel more legitimate. Helping with documentation can also extend to translations. If you can read and write a language that a project doesn't already cover, you could help by translating the documentation they already have into yet another language. Another way to contribute to an open source project is testing. A lot of projects could use more help with their testing, especially writing more automated tests. Some could also use help in areas like running test days, where groups of people try out the software and report issues. No matter what the project is, there's a good chance it could definitely benefit from more testing. Another less direct but still very important way to contribute to an open source project would be to provide help to others. If you know how to use the software but just aren't comfortable writing code or documentation, you could still help users with questions on IRC, mailing list, or in forums. Find where the project operates, start reading and responding. It's that simple if you already know the software. Help like this frees up time of people typically working in the other areas. This can go a long way to make some projects successful. So why test? To be honest, my first reason is kind of selfish. It helps people like me that test and do quality assurance for open source projects. It also helps the developers. Developers can make better products when more people test. In some cases, you can learn how the software works from the test code. It might be easier than learning the source code itself. Starting small and building from there might be easier to do with test and help focus your attention on smaller, more manageable areas of the project at a time. A lot of projects already have test upstream that you can review and run. Some of these need to be expanded. Some only cover the bare minimum of testing and can use a lot of expansion. Others might have a lot of tests, but could still benefit from a fresh new perspective that could identify new areas to test or new ways to test. Some may have no test at all and be a completely blank slate. Manual testing is time intensive and not always accurate. Human error can affect manual testing, so projects where possible could benefit more from automated tests. This doesn't mean that manual testing is useful. On the contrary, even if you can't write tests, if you can do manual exploratory testing, you can open tickets for bugs or new tests to be written. Also, sometimes it's just easier to start with writing tests than it is to start with development of the software. Test code may have less strict requirements than the software itself. You may be allowed to use the programming language and frameworks of your choice for tests, whereas you may not have that flexibility working with the software. For example, the test may be written in Python, whereas the code itself is written in C. Planning test. So at this point, I've got my assignment. I have my general plan to store test upstream. Now I need to actually learn T-LOG and what it can do. There are some basic patterns you can follow for learning a new tool. I even had a training course one time where the instructor spent almost as much time showing us where to find information as he did showing us how to actually use the software. For T-LOG, I started with the main readme in the GitHub repository. It had a lot of good information about what it was meant to do and how to do it. Another good source of information to start with is to know what files are included with the package you're working with. You can run an RPM command or packaging command base that your distro uses to get a full file list. Pipe that to a few different greps and you can find a list of binaries, libraries, or man pages easily. Speaking of man pages, that can be the next logical step. Once you have a list of binaries, you can man one to find out how to use it. Since you need to record something before you can playback anything, T-LOG Rec would be the first place I'd start. Go to a man on that command and see how to use it. Next up would be to use the tools. In my case, I started with different methods of recording and looking at files or journal entries. Then I moved on to setting user shells to T-LOG Rec session and looking at the recorded information from the user's activity. Be aware that if you're using well, this is not the suggested mechanism for a convenient user to record their activity. Instead, it is suggested that you use SSSD's session recording configuration options. After learning T-LOG, I needed to plan the test. This was a fairly easy step for me. I had a lot of experience already with writing test cases for command line tools. I wrote a lot of downstream tests, as I mentioned, for identity management and route. Now I just need to do a T-LOG, which has a much smaller feature set. So I took each of the tools and came up with test cases for each one. Covering things like expected behavior, expected failures, does the command work the way it should? Also things like how configuration changes affected the tool's behavior and a few other things here and there. Next, I needed to write up high level test cases. For this, I'd write a brief description of what I was testing, the setup necessary, the steps to perform, input, maybe expected results, maybe expected errors, error codes, error messages, etc. Some of the details, as I mentioned, might include specific types of input. This would be necessary for steps checking playback output does accurately reflect the input. Finally, for planning, I needed to get the test cases reviewed by the developers. So I shared the document and asked for their feedback. This would be the first actual point where I hit some of my anxieties. But the planning phase, for the most part, went well. I sent the document to Dev with a list of test cases for each tool and most were approved with little to no changes necessary. Dev made a few suggestions, but for the most part, the entire process of planning the test is straightforward and painless. It was nothing to worry about. So far, so good. On to writing tests. The first step was to pick a language and framework to use. I've written tests using bash and special libraries in the past, but most of my test writing in recent years has been in Python using pytest. Since I was already familiar with these, there was no learning curve there. Second thing I needed to do in order to write the test was determine how I was going to handle interactive input and output. I needed a library that would let me pass input to an interactive shell and define how I expected the program to respond. In short, I needed to be able to send input, parse output, and possibly send different input, again, based on some basic criteria. As I mentioned, I wanted to do this with an interactive shell. There are a few options that I knew of. I could have gone to the trouble of writing a small library using the sub process module. But that would require more work on my side and complicate the test code. Another module I knew about but hadn't used before was P expect. I already had some experience with expect so I thought I'd take a look. I like what I saw. So I went with it. Once it was time to actually code the test, I started developing them offline. I wrote them initially so that they were independent of the source. I wanted a place I could work and get input only as needed. This was a mistake and this is a primary lesson learned for me. I should have simply started working with my fork of the project and worked upstream from the very beginning. It would probably have saved me some time and headaches with getting the test code pull requests accepted later. And it would have been a more natural process for writing and submitting the pull request in general. Speaking of submitting pull requests, now it's time to submit upstream. This was the most stressful step for me. I was worried about how my pull request would be received. Was it good enough? What was wrong with it that I didn't see? Would I make a mistake in the GitHub workflow and have to drop my pull request and submit another? These were all things that I thought of that caused some level of anxiety for me. I had another problem to face as well because I wrote most of the initial test entirely before I started submitting upstream. I had to break up my code into smaller logical pull requests. I began with a small subset of simple tests to get the initial code in place and go through the whole GitHub workflow. This was a problem because I had to manage my tests carefully to make sure I didn't forget something as I submitted follow-up pull requests with more test code later. I got a lot of feedback on my first pull request. Dev pointed out several things that would improve my code including suggestions for where to store and how to organize some of it. I even got some tips on improving my setup shell scripts. One of the suggestions was to use bin bash instead of bin SH because not all distributions use bash for bin SH. And unless you're using POSIX only standards, you should be shell specific. This is good because it helps make the code more distro-agnostic. What was the outcome of my submission upstream? My pull request and the whole process to get my code merged into T-Log was not as difficult as I thought it was going to be. The hardest part, as I mentioned before, was breaking up my code into smaller logical requests that were easier to manage pull requests for. So what did I learn? I learned that working upstream wasn't that bad. It wasn't as scary or painful as I was worried it would be. Logically, I knew it wouldn't be, but fear and anxiety don't always follow logic. Aside from learning T-Log, I also got to learn a new Python module, which is very useful for processing interactive IO as well as some tips for shell scripting. The most important thing I took away from the experience was that I had successfully navigated through a GitHub workflow upstream. I no longer needed to worry about how to do that and can move on to submitting more pull requests to add the test I'd already written. I do wish I'd written the test in my fork from the very beginning because it would have made the whole process more fluid and natural. I wouldn't have had to break things up and submit in chunks from my downstream private repository while trying to make sure I didn't miss a library update that was necessary. I would have submitted my first pull request as soon as the first few tests were ready and then proceeded to submit pull requests as I wrote new tests. Since it was the most beneficial part of my journey, other than the lesson learned, I thought it'd be good to share the basic GitHub workflow I learned. First step after you find the project you want to work on is to create your fork of that project, then clone it to your local workstation or wherever you're going to code. Next, create a branch to work in. Don't work in master, work in a branch with a descriptive name if possible. Check with the project's contribution doc in case they have any preferred standards that they want you to follow. After you have your branch, start work. Create files, edit files, etc. Make sure your code works and then get add and get committed. Now you should push your branch to your fork. When you do this, you should be presented with a link to submit a pull request with the upstream project. Follow this link to the pull request submission page and submit your PR. When you get feedback from a reviewer, discuss suggested changes if necessary, then make the agreed upon changes. After making those changes, you should then re-add the files and commit again. But in my case, I am specifically amending the previous commit. This can help skip the need to squash multiple commits later. Not all projects like to work like this, so you may want to ask before committing and updating. Once you've committed your changes, force push your branch to your fork. This will trigger an update in the pull request also. If you receive further requests for changes, go ahead and make them and repeat the add commit push steps. When the reviewer is happy, they will merge your change. You've now successfully contributed to open source. What can you do? Bright tests. So how can you get started with open source software? Start by finding a project that you would want to work on. Find something you like. If you like video processing, network scanning, games, whatever it is. If you can find a project that looks interesting to you, that may be easier. Or find something you already know. If you know some software already, go look and see if you feel comfortable writing tests for it. Maybe you already know programming language and you can work in any specific project that just needs tests written in that language. After you find a project to work on, the next step is to find some of the docs. You can start with the readme, usually in their Git repository, whether it's GitHub, GitLab or some other repository. Those usually show how to install and start using the software. After you start to get a little familiar with it, you can also look at the contributing doc. This will show you the steps that the project expects you to follow when contributing to it. For source code, they may show you things like issue tracker queues, where they have easy tickets that you might be able to add tests for, or if you're doing development, add bug fixes. After that, you could also learn the software in other ways. Go through official documentation if it exists, man pages, trial and error. After you've got a handle on the software, start writing tests. Take a look, see if they already have tests in place. See how they work. If nothing already exists, then add a test directory. You may be asked to change this later. I was. I started in one place and was asked later to rename my test directory. Then start writing your tests. Python, PyTest, Java, Bash, Go, whatever. Some projects even use Ansible for their tests. You can also, at this point, go ahead and submit your pull requests, review a dev or the project team. And once it's merged, congratulations. You've now successfully contributed to open source. And close out with some of the links to the resources that I mentioned earlier. And that's all I have for today. Thank you. Thank you, Scott. It was a great presentation. Let me just stop my sharing. Now we're open for questions. So feel free to submit your questions in the chat window and Scott is here to answer them. Scott, I think you're on mute. But now, can you hear me? Yeah, now it works. Okay. The one thing I would do, I would go back in time to tell Pascal to do, to make everything easier would have been to work entirely upstream. The biggest problem was that I developed a lot of the tests from the start in a downstream private repository and then had to pick and choose pieces to submit upstream. So when I made my first submission upstream, I had to, I picked, I think, I don't remember if it was two, three, a small handful of tests that I wanted to contribute. And I stripped out everything from the library files that were the support functions that I used and submitted those. And then for future submissions, I created pull requests and chunks. So I would submit different types of tests and that caused a lot of headaches for me for having to maintain mostly the library file because the chunks were typically a whole file at a time. But I had different support functions I developed that were very specific to different types of tests. So I would have, if I had worked entirely upstream from the beginning, that would have made my life much easier because I would have submitted when I had three tests. And then I would have submitted as I went instead of having to pick and choose how I put it all together for each individual pull request later. So that's my biggest thing I would have changed and told myself was don't work as much downstream as I was at the time. Just work in my fork and submit as I went, which would have made everything easier. Thank you, Scott. Do we have any more questions? Going once, going twice? And I guess we're good. So once again, Scott, thank you very much for a great presentation. I actually really enjoyed it myself and it reminded me about my first steps in open source. I believe for today, that's the final presentation for ensuring software quality track. Thank you very much everyone for listening and watching to us.