 Hi. We're going to start since we had media difficulties. I'm Robin and today I'm covering interesting things in testing S3 implementations mostly around RGW and some projects to improve on that. So as a coverage, first of all, RGW has lots of exposed edges for testing. The security talk, I felt, could have gone into a lot more detail about, hey, what security issues does S3 at the HTTP edge have? And as I say, there's a lot of them, the request headers, the response headers, the body. There's a lot of pieces that can go wrong on that S3 edge. S3 tests was started to, hey, how do we test RGW's S3 as doing the right thing? And most of it today revolves around driving Boto and seeing what happens. Much of that was started with, hey, let's record what AWS does and then replay that against RGW and see the outcome. Does it work in both cases? Boto, however, cannot construct all of these tests. So there's some that have been manual HTTP requests or even in one case there's one manually constructed. Here, let's open the TCP socket and send something very specifically formatted HTTP that is hard to reproduce otherwise. And that is S3 tests. Going into, hey, what does it look like before we get to how are we going to improve it? This test example says, hey, what happens if we send a bad credential, malformed credential specifically? And we should get this authorization header malformed in it. But really we start to say, what's expected? Is it something that the specification says, or it's not in specification, RGW used to mirror what AWS did and then AWS changed what they said they were doing. I'm going to come back to that example in a moment. This also means sometimes testing what the response says. Hey, so it's a bug I found. I was writing this presentation and I said, hey, this test has an implicit information here, the length of content happens to be 11 bytes, but we just wrote it there. We didn't actually measure it and this test would break if we changed something. And if we do send range bytes 4 through 7, we should make sure that that response content range header mirrors that part of it. But things end up going wrong. Boto and AWS changed parts of the specification if you want to set up replication rules and your older version of Boto, you upgraded it in the background. And now suddenly your RGW said, no, we're not doing this anymore because Boto started sending the version 2 version on the right and the RGW for a long time only supported the version 1 on the left. And there isn't a test case for this at the moment. It's not detected. One here where I didn't include the rest of the test cases. As I said, some cases you have to build that TCP socket. If you want to muck around with what you send for expect 100 continue header, you can't really do that within the constraints of any HTTP library. This is going to become more important in the future. As we say, hey, if we do start doing this on HTTP 2, and yes, I know AWS doesn't offer S3 on HTTP 2, yes, but Minio and a number of other providers do. It is coming. What's going to happen? You break in the middle of your headers and what happens? Things might go weird. And lastly, there's one other category of the tests there, the S3 transactional. Hey, if I do a sequence of S3 operations, what's going to happen? In this case here, if we do upload five objects, we should get five objects back in the listing. And there's a related test bug to this here. If I do a listing in a certain way, I will get 99 objects back and not 100 objects back I expected. So how do these end up relating here? As I say, previously mentioned tests have focused all on let's do what one S3 client does, BOTO, or the error comment or the error paths around that and say, hey, what does that work? They don't cover things in S3 clients that aren't well supported yet. For example, if you do get object, there is a little header you could include to say, I know I made this a multipot upload originally. Can I have the original parts back? One S3 API supports that, the Java SDK. Otherwise, but if you include the header manually, you can get that information back. BOTO and others do not support doing that. You have to test it manually. There are also parts that really don't fit in any S3 SDK, post object, and we'll get to fun things of post object soon. But the last point here is the one that really came. What is complete? When is everything tested? What is everything? How do you make sure about that? And at Cephalicon a couple of years ago, at the further work talk section of a talk I did with a Lima reader was what are we going to do about how do we improve this testing? And I said, we have to start measuring things. And this was to evaluate X, where an X in this case is RGW, we're going to have to measure it twice, running something in the middle, and compare it. The idea was good, but it languished for a while due to lack of cycles and time to spend on it. I did some research, but didn't get any further until 2021, where I was fortunate enough to, I pitched one idea to my boss at the time, as, hey, let's get an intern at the company and make them work on this interesting project here. Well, that image moved, I'm sorry. And at the same time, a simpler version, simpler bit related version of it. I wrote up for Google Summer of Code as a project idea. And I happened to get students for both projects. I was only expecting one at that most. I was very surprised to wind up with two students that could work on related project together, one of which, as I say, Google Summer of Code and the other one was a paid student intern for me at DigitalOcean. And so I had these projects split up here, and the intern project ran a bit longer, but was definitely more complicated, and showed, hey, let's work on the structure of this thing first, and the concept, and have it understood. And the Summer of Code students, as a fantastic implementer said, let's focus on the targeted pieces here. So for Summer of Code, we said, let's measure BOTO and say how much of BOTO are we actually using while S3 tests run? Are we using everything in the S3 implementation of BOTO that we know is supported in RGW? That we'll exclude the parts that aren't yet supported in RGW? And just say, hey, these parts, we say we support them, but we're not testing them at all. And what we're going to do there, and then for the intern version, the fun C++ difficult part, we measure RGW while running S3 tests and say, which parts of RGW didn't get tested by S3 tests? What can we do about them? And the intern student came up with this great diagram. And I'm sorry, the conversion of the diagram went a bit wonky. Just saying, hey, let's run RGW, run the tests and stop it and take code coverage. And from that, we generated separately a summary, detailed branch and line data, and branch reports, which wind up being very important. I'll show some details of that shortly. And build the data together and give us a little web front end to go and look at what's going on, what did we not cover? So jumping over, the easier version to read was the Python stuff. Just, hey, we ran this and which pile of lines didn't wind up being used. This is a sample of showing the S3 tests itself rather than the BOTO lines. Just it was much easier to read for this example here. Note the missing line section. Those chunk of lines, they didn't get executed when we wanted them to. Turns out there was a piece of dead code in S3 tests. Hey, let's clean that up while we're there because we know it's something we're not testing. And then from this, we had the function traces saying, hey, let's run this through. Which of these functions is it hitting, first of all? Saying, okay, we know these. I expected it to hit some of these other functions, but it didn't. But we know these functions so far. And we can carry on further focusing down what we're looking at. And one of the early things that stood out there, when you do delete objects, the S3 specification says at most 1000 objects, and the tiny branch that said that returns, hey, you're sending too many. We never tested that. One, this was the first test improvement submitted by the summer code student. Hey, this branch, this tiny branch, two extra lines, we're going to test this now. This is proof that the concept works. We found something that wasn't tested. We added one test case. And now we can show that it is tested. Sure, it's a tiny thing, but it's great. So on back on the C++ side, we, we carried on saying, hey, what is testing? And it was a lot more difficult to measure things. The, unfortunately, I don't have my my presenter notes on this here. March through May was just adding, hey, do we get it working? And then May through June, we started adding new tests. And you'll see that, hey, most of these were, we're not even testing this entire chunk of functionality. Let's add a simple test for that. Just starts covering those files and adding getting one test to cover some line of a file is easy. Covering some lines in the file is easy. Covering all the branches. Are you sure every possible code path is being used is hard. And you can see that the the branch growth, it didn't get much higher. We did cover more some more things. And that was a good start there. So I want to dig into one of the bug, the good bugs that we wound up finding. So we had a couple of while I was with digital ocean, we had a couple of customer reports about, hey, if I'm trying to upload using browser post, in some weird conditions, it just returns too small. And I absolutely know that my data is not too small. I'm sending megabytes and megabytes. And the the browser post policy says it's only, it must be at least one meg and should be not more than one gig. And this we could dump it and say the size is absolutely correct. But trying to find why it didn't trigger, we were pulling our hair at it. And we ran our new testing against, hey, let's run some tests on this and see what happens. And this tiny path here, highlighted from my fix, we said, that shouldn't have triggered, but it did. And the reason from that is using the wrong variable. Length in this case, was not the length of the tiny piece being uploaded. So it was the length of the current piece being uploaded versus the entire long segment, we wanted to offset to say, hey, when we have all our segments concatenated together, are there enough of them? Fix it, tiny three bytes, three bytes to fix a bug. But hard to find. So now on to further ideas and things here. This is mostly talking, I don't have slides, unfortunately. I was at DigitalOcean until February of this year, but I'm no longer at DigitalOcean. And I don't have some of those other materials and slides anymore. So we started to build a new idea for testing with the intent of publishing it and say, AWS has built all its services on JSON service definitions and tried to say, hey, if we build a structured format for all of what the operations are, and then generate every single SDK from those service definitions, we said, let's take those service definitions, and instead build tests from them. They cover all possible headers, all possible operations. And if we say, we know these ones here are supposed to be supported by RGW. Let's throw it in, throw in that from these built tests to carry on fleshing out the rest of the test suite. And this was going, this would have wound up covering most of what are well formed inputs for the S3 testing. The nature of this wouldn't end up, hey, what happens if I send a malformed input? But then we would said, hey, now that we know where it's supposed to go, let's start, what can we do to start fuzzing that? And in doing that, there were a number of subtle errors that we found, if you said, mis-formatted cause headers, there was one that wound up becoming a security bug a few years ago, you could send a new line in a cause header, and you could then do HTTP header injection. And malformed inputs there were promising to us. I think that's going to cover most of my pieces, and I mostly left this for questions here. I'm still super, super on time. Thank you. Questions? On Botos, on Python side or C++ side? So on, I remember the name of them, are PyCobbacher turned out to be very good on the C++ side. I can't remember the name of the package on the Python side. Maybe PyCoverage. I'm sorry, I don't remember the name of that part there. It is in the linked pieces. And the presenter notes here that I couldn't wind up seeing because of the media display issues would have helped answer that as well. And I don't know, Josh, do you think there might be a future in testing other parts in a similar idea? Similar way of coverage measurement? So to repeat what Josh said there, it winds up on the recording. Josh said in the past, there was some parts in C++ that could do a similar thing, but they became un-maintained, and I hope un-maintained got pruned. But we could possibly bring it back because we built an infrastructure to do a bunch of this measurement. The questions? Going once, going twice, sold.