 First up, Matthew, subunit to SQL. Thank you. Thanks for the introduction. So I'm here to talk about subunit to SQL, which is a little project I started to solve a problem we had given the scale of OpenStack's CI system. So I'll start with just some raw numbers to give an estimate on how big things are. So when anyone pushes a change to OpenStack, the CI system kicks off 5 to 25 DevStacks, which is a self-contained bash script that brings up and running OpenStack's cloud and starts, we interact with it with second level virtualization. As part of that, we run about 10,000 integration tests, which is using that environment. We basically just run tests against it. There's about 1,000 and 1,500 tests for each DevStack we spin up. And as part of running those tests in a single, we bring up about 1,000 guests using second level virtualization. What this means in aggregate, for the month of December, I used the database, which I'll get to. We ran 12,000 gating jobs. The CI system has two tiers of jobs, check jobs, which are for proposed commits. And once they're approved, they go through a final gate check, which specatively executes them. And we ran 12,000 test jobs just during December, which is a light month, because everyone goes on vacation for the second half. As part of that, we ran 15 million tests, which is a big number. So because of the size, we've come up with some interesting solutions to do some analysis on what we're running. We have an Elk stack elastic search to analyze the logs. But because of the nature of elastic search and the fact that we have some resource limitations by running in clouds, we can only have about 10 days worth of history, which isn't very good for doing long-term trend analysis, because 10 days isn't very long. And we also have things like graphite and other tools, which give us a longer term view, but they didn't give us the granular test level. And since I work primarily on some of the test suites, knowing how the tests run over time is very important, so we can make informed decisions. So the primary motivation was having the ability to look at an individual test and see how many times it fails, for example, or is it getting slower. And one of the big things we've seen with elastic search in the community is that since it's a queryable API, we can write programs that automate finding things in it. We use it, for example, to find known race conditions. We fingerprint them and search through the logs for 10 days and automatically identify known race conditions in test runs. So having a queryable API was key if we wanted to do this. So that's when I started writing subunit to SQL, which is very simply. It takes in a subunit stream and stores it in a SQL database. For those who don't know, subunit is basically a machine parsable output from a test run. It does a lot of fancy things, like multiplexing, and it supports running in parallel and a lot of cool things. And it's got a lot of broad support for most programming languages, and it's very easy to run, and it's got a lot of filters. So subunit to SQL takes in one of those streams and then converts it to some SQL queries or updates and stuff to put it in a database and has three utilities. One is subunit to SQL, which just takes it in, stores it in the database. Then there's SQL to subunit, which just gets a subunit stream out from the database. And then there's also a command to manage the database schema. Since I use Python to write all of this, it's SQL alchemy with Olympic migrations. So we can update the schema over time as needs change. It also provides a Python database API. So you can write Python programs very easily to interact with the data store that you have set up. And I've been using that to play around a lot. It's kind of cool. So for the data model, that didn't, no, it's kind of illegible, I apologize. But there are three basic tables. There's a runs table. And each row in the runs table matches basically one stream. It's basically this is a test job. If you're familiar with the test repository project, it's one, I forget what Robert calls it in that. But if you're using the file repository, it's one number. Then there's tests, which is an aggregate view of all the tests that have ever been run during any of those runs. So each time a new test is run, it gets added to that test table. And if it's already been run, it increments the counters for things like success, failures. And it also doesn't running mean for runtime. And then there's a test runs table, which is basically the product of those two tables where it's every unique instance of a test being run. So let's say you've run three runs with the same test suite. There will be three entries for each time a test is executed. And that table is used to store basically the status code of the test and the start and stop times for the test. And then each of these tables also has a key value store metadata table because why not? And we needed it for runs because we have a lot of metadata associated with our test runs in OpenStack CI. So we wanted to ensure that we could store that information with the data. And since I was adding key value metadata tables anyway, they were there. We actually used the test run metadata table right now, but that's going to be deprecated for what it's used for. So how we set this up in OpenStack's infrastructure is a little convoluted, but cool, I think. So we have our test slaves, which go off and run. And we have our Jenkins masters, which communicate to them via SSH and trigger the jobs, and then pull the artifacts off of them, which then get stored on a log archive server, which is just a big file system, basically. And it's got a web front end on it, so you can pull the logs. That triggers is EMQ event, which a Gearman log client sits there and listens for the events, and pushes a job on a Gearman queue, which then we have a worker that sits there and listens on the Gearman queue for a subunit event, and pulls the job off the queue, processes it using subunit to SQL, and dumps it in a MySQL database we have running on a Trove instance in Raxbases public cloud. And there's also public access to that database so people can interact with it, which is probably a terrible idea because we're going to get DDoSed, but anyone can connect to it. We don't publish the credentials, but we don't keep them private, either. So if you want them, you can ask me. And that's how it works. And we have to do that because of the distributed nature of the infrastructure and some security assumptions with the slaves. The advantage of it is also we can horizontally spread out workers. Right now we only need one, and it seems to be able to keep up with the load from the tests. But if we ever needed more workers to process the subunit streams that were coming in, the Gearman worker section over there can scale out horizontally because they'll just work in conjunction and process them in parallel. But with one service running on one worker, it seems to keep up with the load of OpenStacks CI very well. So now I'm going to talk about some of the cool things I've been able to do with having this data. One of them is catching performance regressions. So this is actually the first graph I generated using Pandas as a time series, and it's incredibly ugly. But it shows you some interesting things, like what's going on on the bottom? Why is it jumping up and down? And that clearly shows that something is running faster most of the time, and a lot of the times it's a lot slower. And the green line is just a running mean because that's what I wrote the script to do. So when I was looking at that, I said, well, what happens if I split those by just group them and look at the graphs? And you can see that the middle line where it's about 150 seconds to execute the test, there's still a significant variance. But that's about expected. When I was looking at other tests, we have about a variance of 40 for runtime. And that just has to do with the nature of running second level virtualization on a public cloud with shared hosts. We get noisy neighbors and a lot of other variables that affect performance. But that bottom group was very consistent. The top group, I believe, is a race condition that I haven't been able to track down yet. But when I was looking at those groups before, the fast runs, which is the green line, they all had consistent metadata. They were all running the stable ice house branch, which is older code, and they were all running on precise instead of trusty, which we run on master. The slow runs were all master and a newer branch of code and all ran on trusty, which is interesting because we had no idea this performance regression was there because looking at test runs one off, you can't really tell. And then there was that slowest group at the top, and there was nothing in the metadata that was different between those runs and the regular slow runs. And my belief is that's a race condition in the test or an open stack that is affecting performance that we can't isolate yet. And it's theoretically possible we could track down what tests were running at the same time and try to isolate the race condition with the database, but that involves more sequel foo than I'm comfortable with. The other thing we can do, which was something that was important to me in the primary motivation, is figure out how often a test fails. So if we run 20 million tests in a month, knowing how many times test X failed isn't very easy to figure out. But using the database, you can write a query. This was just an ugly example I had in my client history to pull out how many times a test failed. So this one will look for all the test hot plug nick tests. There are two, which is why I used the like instead of an exact test ID match. And you can figure out how many times it runs. I'm going to bake this into the Python API so you don't have to look at a very long query with an inner join like this. Yeah, and it's pretty quick, at least on the second time, given the size of the database. Because last time I looked, we have about 37 million rows in the test runs table. Yeah, so it solves that problem, which I wanted to use as the maintainer for the test suite to be able to know when it's safe to remove a test. Because if it's constantly failing, or failing 1% of the time, we probably don't want to remove the test because it's catching something. But if it's not failing, then it might be safe to remove it. And the other thing we're starting to use it for, as of this morning, is optimizing the test runner. We use the test repository, test R, which is test runner, runner, which can take previous timing data to make scheduling choices. It groups the tests into multiple workers using the previous timing data. And since we have this giant repository of all the test runs over time, we can use that to extract timing data and populate the scheduler with this information so it can make more informed decisions and hopefully improve test performance a little bit. And the long-term goal is to integrate this in the test repository project so that it's baked in. And we don't have to, yeah, Robert's smiling, so that everyone can use it without having to run commands manually outside of it. Won't work in the CI in the open second because of some permissions issues with the slaves. We don't want to put the credentials on the slaves, but it should work for everyone else. And we're to get some more information. There's the repository. Oh, I shouldn't have put hyperlinks because then it doesn't show up on the slides. But if these slides are posted somewhere, you can click on those links, and that'll take you to the repository, the documentation. And I hang out on those two channels on free note all the time if you want more questions. And I'm always looking for more contributors because it's mostly just me at this point. And that's a shame because I think this is a pretty cool project. So are there any questions? That's all I had for prepared material. And it looks like Robert has his hand raised. I could tell that without even turning around. Cool. So I must apologize for missing the very start of your talk. And as such, are my questions redundant? I'm sorry. What do you do with the attachments, the standard out, standard error, et cetera? Right now they're dropped because I didn't have a good way to store that in the SQL database. I figured some of them get quite lengthy. And I didn't want to store a 100 kilobyte text file in the database along with each because they would be associated with the test runs table, which already has 30 million plus rows for running for two months. So I didn't want to store those in there. What the runs table does have is a link to a artifacts URL. So we use it in the CI system so it can point to where the artifacts could be found. I can answer to that. Oh, yes. So if you do actually whack them in the database and using remotely in my SQL and you don't select that column, it will never be read off disk. And so if you giz at them as well when you're throwing them in there, they're probably going to take up not much space. So it may actually be OK and just use disk until you. Well, it sounds like. Separate table actually will probably use more space. That might avoid any degradation. Yeah, with the row format for NADB, we'll just have a pointer at the end of the site. So when you have a blob, it says it has a pointer off the end to go look at that page. So if you don't actually select that, all you are doing is a little extra eight bytes per row. Well, if you did it in another table, in that table, you'd use a lot more space because you'd have to have the primary key to the lookup as well. That's the other thing. I taught myself SQL to work on this project. So I am by no means an expert on anything. Yeah, something like this has been my to-do for about how many years? Well. Yeah. Thank you. Thank you. So there's multiple attachments. So since we're designing, there's multiple attachments per test. So my expectation for that would be to be a separate table of attachments. Would it work OK? Yeah. OK. Cool. Question. You said you wouldn't be able to use an integrated version of this where testR itself could do this. Hopefully there's a CI system because we don't want to put the credentials on the... But what about the RESTful API? You said the slaves were considered dirty and we shouldn't put things that have real credentials on them. On the right? Yeah. So what, hypothetically speaking, there was a RESTful API to talk to test repository. Just like it's an alternative back-end, right? If at the local, you'll have SQL, you'll have a web. Yeah. You could have dedicated low-trust credentials for your slaves. Yeah, probably. I mean, I'd have to talk to Jim since it's his CI system. We can figure this out when we're not in this... Yeah, I have some use cases where I want to be able to have random developers throw it up as well so we get data from when you're running Make Check. So the idea is then you could order Make Check for any software project to execute the ones that are most likely to fail for you as a developer first. Yeah. So my general mode of thought on this is that I think we can do something with single-use credentials. So if we have a system that creates a database user and gives those rights to a slave, we could potentially do that. There's a lot of details to work out with that, but we can talk about it. Any other questions? I think it's all in the back. Should we just have the global conspiracy over here in the not corner? So you mentioned that some of the queries get kind of slow. Some of the ones on the test runs table do because there are millions of rows and I'm not very good at writing schemas and there are three indexes on it. We should sit down. Yes, I know. I need people with SQL expertise. That's the... Yeah, a gear. Occasionally, someone discovers I have some knowledge. Well, thank you very much. Thanks.