 Thank you for everybody coming today. We have a special guest calling in from Switzerland, so I appreciate you staying up late Manuel is a postdoc at ETH Prior to that he was a PhD student at Johannes Kepler University In Austria his background is not databases. His background is compilers and PL But he's sort of I mean, are you gonna tell the story in your talk or no Mentions, go ahead. Okay. So the reason why he got involved in testing databases is because For his own research, he was putting some data in a database and then it hit a bug and he's lost all his data And so he's like well, this is bad Let me go run some tests on databases and see if I can find bugs. So that's that's what he's done So sequel Lancer is a new project that he's been working on For a while and this is sort of the hottest thing now in like in database testing in the open source world And I I mean you'll talk about this like a lot of people are already adopting it So it's for the project only being around in a short amount of time. You've had an amazing You know an amazing response and reception about about it. So that's that's amazing at the power of databases, right? Okay, before we turn over and go to Manuel again The way we want to do this is that if you want to ask a question by all means just interrupt him But be sure to say who you are and where you're coming from so that everyone knows who everyone is Okay All right, man. You'll go boy. Thank you. Thank you for being here Okay, thank you, and also very thank you very much for this general introduction I'm really glad that you gave the opportunity to that I can present the work today So as Andy said, I've been working on finding logic bugs in database management systems together with my advisor general soon here from EDH Siri So in our work we have been focusing on finding logic bugs Which immediately poses the first questions first question. What are logic bugs? So let's assume that we have some query that we sent to the database management system here can see and that In this example and also in future examples, I've used this file a symbol to denote a boolean predicate And let's assume that we have a database here with two records for two The predicate evaluates to true Well, then we would expect that both these rows are fetched But in the case of a logic bug only one of them might be fetched Another example for logic might be that the row is mistakenly fetched or that the contents of the result set is incorrect We argued that finding logic bug logic bugs is a difficult challenge because they might go unnoticed by users And also by developers of database management systems This is unlike crash bugs for example, where a user gets immediate feedback because the process ends for example with non-zero exit code Isn't the problem already solved? So if we look at what testing processes database management systems or at least a mature database management systems like SQLite for example use it might seem so So specifically SQLite has about several hundred times as much test code as source code, which is already quite an impressive number SQLite's test cases achieve even 100% branch test coverage Even better they achieve 100% MCDC coverage, which is a coverage metric This is typically used for software development in the aviation industry. So this is also very impressive SQLite is also extensively fast for example by Google's open-source fast project and also by individual researchers And just as another example SQLite I mean does several kinds of testing like unit testing, but they even do anomaly testing where they verify That SQLite works even correct in the presence of out-of-memory error situations import output error situations or power failures Nevertheless in our work, we found over 400 bucks in popular and widely used database management systems So quite a number in SQLite, but also in other database management systems like mysql, MariaDB, Postgres, CockroachDB, TIDP and DarkDB So what are the ingredients for testing database management systems or for testing software in general? First of all, we need an effective test case In our context, this means that we want to generate a database that can stress the database management system in order to expose a bug and similarly also a meaningful query Also, we need a test oracle. A test oracle validates the queries result and basically says if it's correct or not Now for the purposes of this talk We assume that an automatically generated database and query already given We assume this because this is actually the easier part of the problem And this has also already been well researched and there are for example many random database or query generators We focus on the second part of the talk Namely we propose test oracles for finding logic bugs So what is a test oracle as I said basically given some input it determines whether the system works as expected and One test oracle would be to have a programmer who basically writes the specification Like writing down what what the expected result of unit test is but here we are interested in automatic test oracles They can automatically do this So what what could we consider do achieve this? I mean the first thing that you're probably all familiar with is fussing So we could take a random query generator like sql smith, which is perhaps the most widely used one And this query generator would send random queries to the database management system and hopefully expose a bug like a segmentation fault Where basically the test oracle is that the process exits with this error Now while sql smith and other random query generators have been very effective in finding crash bugs Unfortunately, we cannot use them to take logic bugs because they simply cannot detect these cases Now one approach to detect logic bugs, which works at first sight is to use differential testing Where basically we have a query generator that generates a sql query that is then sent to multiple Database management systems. For example, here both press my sql and sql and then we would get back three different result sets And we could compare this in order to determine Whether they are same or not if they are same then we didn't detect a bug But if they differ then at least one of the systems is affected by a bug So this approach has first been proposed by Microsoft research in about in around 2000 And the authors mentioned that differential testing has been very helpful for them. They could detect many bugs But only for the small set of common sql This is also what we and actually also other people realized. So for example Co-crouch labs, they published a blog post where they stated specifically that they cannot use both presses and oracle So both presses the database management system whose sql dialect is The closest to co-crouch to be namely the reason being that it has slightly different semantics It would be tricky to try to acquire is that execute identically on both the systems Besides fussing and differential testing also other approaches have been proposed So in terms of logic testing Perhaps the most impactful one is the one by color can others So they had the idea that they Can use solvers to tackle this problem like smt solvers and and they could find a number of Injected bugs. They could also reproduce already known bugs But they found only one new bug and here we believe that smt solvers might not have a high enough throughput to detect many bugs Um as another example that have been approached as proposed for performance testing Um, I guess a number of you heard the presentation Last or two weeks ago on a polo And this approach is effective for testing for testing for a performance bug, but it's unclear how logic bugs could be detected And for example another approach it was also to test for the cost accuracy of the query planner But overall we have to conclude That the problem of testing database management systems to find logic bugs has not yet been well addressed And this is where our work comes in. So we have been working on three new practical approaches for finding logic bugs namely pivoted query synthesis pqs non-optimizing reference engine construction no rack and ternary logic partitioning tlb So pivoted query synthesis was our first approach we found about one of her bugs using it And the core idea of this approach is that we want to generate a query for which it is ensured that it fetches a randomly selected row That uh to which we refer as the pivot row if the pivot row is the not contained the results that we have detected a bug The second approach that we proposed was non-optimizing reference engine construction The idea of this approach was to generate a query that is likely optimized by the database management system And then transform it in a way that the database management system cannot effectively optimize it thus allowing to detect Bugs specifically in the query optimizer of a database management system Is that is that thrown sequel or is that something lower? Um That's actually in sequel. Okay, so it's a completely a black box approach and actually the advantage is that While it's it's unobvious how to come up with it. It's actually quite straightforward to implement right And the last approach um is a term or the latest approach is ternary logic partitioning This has allowed us to find over 50 bugs in widely used database management systems And the idea of this approach is that we want to partition the query Into several so-called um partitioning queries each of which computes a partition Then um to combine this partition so that The same results that as the original query can be generated And this approach is actually applicable to quite test quite a number of different sequel queries And since you just ask Andy so all three approaches They work as black box approaches. They only work on the sequel level which makes them general And widely applicable to test any kind of system that supports sequel So maybe Maybe this was sequel answers. It's sequel answers the combination of all three Exactly. Yes. Beautiful. All right. Keep going. Sorry No, thanks. Please go ahead whenever you have any questions Yeah, so As as just mentioned a sequel answer implements all three approaches Maybe not all of them are as maintained But we're actually currently working on on improving the code quality and also the testing infrastructure of sequel answer and Since last tuesday or so it's actually available as open source software So feel free to check it out and report any issues that you might find also if any questions We have we also have a slack channel and so slack workspace And with that I want to Continue by introducing one of the techniques That we have been working on namely ternary logic partitioning So the idea of ternary logic partitioning is first of all based on a more general idea that we came up with And to which we refer as query partitioning The idea looks as follows. So First of all, we have a query generator that generates a random query q And to this query we refer as the original query You can also see here that in general we use this circle to denote a result set of the respective query And then we want to partition The result set of this original query by deriving multiple queries from it So each of this derive queries then computes a part of the result that can then be composed to yield a new result set and basically These results that should yield the same result If they do then everything seems to work correctly And if not then we have detected a logic bug in the system So um this queries that you derive from the original query, we will refer to them in the further as a partitioning queries And um these parts of the result data compute that we refer to them as the partitions And this a diamond symbol here Um, basically we call it the composition operator So I guess the general um approach probably sounds plausible to you, but of course, it's not obvious how to realize it Specifically the key challenge is that we want to find a valid partitioning strategy That first of all works for which we can define a composition operator and so on But also that stresses the database management system in different ways so that we can find bugs in it At the main site inside of this approach is the following So let's consider that we have some given radical which is probably randomly generated At any given row and exactly one of the following must hold Either this predicate evaluates to true either not and this predicate evaluates to true or This predicate is null evaluates to true basically meaning since a sequel is a ternary logic either five evaluates to true to false or to null And we refer to these um predicates like based on this given phi here um as the ternary predicate variants And the idea is the following now. Let's assume that we have some some given result set Then we can use this idea by filtering out rows where the predicate holds where it doesn't hold and where it average to null to partition it into three parts And that is the core inside of of our approach Now this was perhaps maybe a little bit abstract. So I want to continue by Demonstrating a bug that we found in my sequel So you can see here that first we generated two tables Both of them contain one row So here we have a zero value contained and here a minus zero And um, you can see here that in this query you want to fetch all records where these two um values evaluate to the same value And while this um predicate was expected to evaluate to true Unexpectedly it didn't and my sequel failed to fetch this row What was the data type for those columns? Um, so that's a good question I think for t1 it was float because we have minus zero Yeah, okay. Yeah, and I think for for t0 it it was int, but I'm not sure it might also have been a float. Okay, but I'm actually so um Yeah, the underlying bug here was I think uh a bug in the hash join of the of my sequel Interesting. Okay, cool So now the question is how did we derive this um that this is a bug based on tlb So we did in the following way First we generated a random query which looks like this. So basically just fetching the cross product of all values in t0 and t1 And this query is quite simple, right? So we wouldn't expect any Uh, that my sequel process is incorrectly and indeed my sequel fetched the correct result for it And based on this query we derived now the partitioning queries So you can see here that we have the free ternary-rebratic adherence here. Maybe once where this um evaluates to true Once where it evaluates to false and once where it evaluates to uh null And for this query my sequel unexpectedly didn't fetch any rows Which is unexpected because uh, we assumed that these two result sets would be the same And this allowed us to detect this bug and basically reduce it to the version that I showed on the previous slide to conveniently report it then So you automatically do that coalescing? Exactly, yes Interesting call So basically we just um randomly channeled this query and then there is a mechanic um process that can derive this query from From the original query if you have time I actually Um can show you at the end in a demo. How this is implemented because um, it only is implemented a couple of lines of code So um tlb and also some of the other approaches like norek that we proposed. They're quite easy to implement Especially since they only operate on on the sequel level Demo is really awesome So I have a question. Sorry for interruptions. I'm wondering what kind of means here you are. Yeah. I'm liang I'm a pg student here Working with andy on databases, right? So I'm wondering what kind of bugs will this cover? So clearly this would cover the bugs when you evaluate the predicates, but for example, if the query is a Aggregation query, right? You may have a bug in computing the average value or some mathematical operation on a set of rows. Will this kind of bug also solve those? Will this approach also be able to discover those kind of bugs? Yeah, that's a very good question and um Um, I will actually present still some examples for testing aggregate functions. So I would say we um, maybe like um I will come back to this later if Okay, but a very good question fix Um, yeah, so how does this approach? Uh, sorry kai kai ask a question here Uh, so my name is chi. I'm a phd student in georgia tech So my so so so so my question is it sounds sounds like this technique was basically trying to Generating because the left query and the right query are semantically equivalent, right? So they're expected to get the same result but Yeah, but but but so I so I so I have had so I have been working on these like a testing database For a while like a couple months. So so my wondering was like there are already many systems serve as the database optimizer, right? So basically a database optimizer is trying to Trying to like uh, uh transform a qq a query that's can executing fast faster, but there's a different query. So basically Have you ever tried like use a data? Uh, like a database optimizer trying to rewrite the query So basically you get like two different query and see if they can get like the different results to expose about Yes, I think that's also a very good idea. It would be possible to do um similar Like optimizations like database management systems do but we believe here that or I believe here that this would involve More work because we would need to define these individual rules for all the operators To define basically the metamorphic relations Yeah, but but to the the thing here I was thinking about was because all these Transformation rules actually have been implemented by the other systems, right? So the from the testing perspective we kind of get all these transformation rules for free But these I mean these these transformation rules are serving the purpose trying to optimize the execution For the uh for for the sequel so it might not be suitable for the testing purposes But I'm just wondering have you tried or like uh Some some something like that. Yeah, thank you. I think that might be a good idea. But like one, um One goal that we wanted to achieve is also that we wanted to have a system Um or an approach that works in general for all database management systems And we which we wouldn't need like to basically implement on uh For each query optimizer specifically to basically who can do the database management system so this works only on a sequel level and I think that's that's a Uh core advantage of this approach Because you do not need any knowledge about the underlying query optimizer You can just randomly generate queries derive the partitioning queries and then check for the result But it would be an interesting idea to to explore this further. I think Yeah, thank you. Thank you for it No, it's not the two my own horn, but we we actually generate the correct result for this query So I'm happy Okay, keep going Our database doesn't have this bug. We crash on something else. Okay So you can see here in the illustration. Um, how The previous example relates to this approach. So first of all You can see here that for this query queue We had this the simple query which simply fetches the cross product of all values Then we have this, um partitioning queries here They mean one with the non negated version of the predicate one with the negated version of the predicate and find it The is now version of it Well, and for the composition operator, I didn't mention it just now in the example But here we use the union all operator Which basically combines the records without filtering out duplicate values duplicate records Now, um I've demonstrated that the the technique is applicable to test work clauses But I received also the question before if we can test aggregate functions And the answer is yes So the approach is general in the sense that it can be applied to quite a number of different features And now I will do um demonstrate how we can find bugs in group by clauses in having clauses In distinct queries and also in aggregate functions so Are the other types of queries like say common table expressions Uh nested queries is that just because you haven't gotten that far to actually test those queries Or there's something more fundamental about your approach that it wouldn't work Um, so I guess it's just that we haven't really thought about testing them yet. So We're concentrated on the on the core patterns but I think um The idea of query prediction and also of ternary query prediction. It might be like there are several combinations also possible and also Um, I think we could test more different features Okay, cool Yeah, so, um, I already explained how we test Work losses or the correct implementation of them And for this and for the subsequent Test oracles, I want to use this kind of table format here, which is sufficient to fully describe The test oracles so you can see here that we have this original query for the work loss oracle Then we have here the partitioning queries in the second column Where we have the ternary predicate variants in the work loss And finally here we have the third column which describes the implementation of the composition operator Now here you might maybe not all be familiar And this symbol here basically notes the mild-desert addition Which combines the records but without filtering out Complicates and in sequel we can implement this using this union oracles Now, how can we test having clauses? Um, that's I would say quite a straightforward extension So this time we have the ternary predicate variants in the having clause And besides that we can also Randomly generate work clauses and group by clauses, which are simply copied copied over to the partitioning queries And the composition operators implemented in same way as for testing work clauses as well For testing distinct clauses the original query here contains a distinct clause And for the partitioning queries we can either decide to Also include it or to omit it because here the inside is that the composition operator is implemented differently You can see that we are using the set union to implement here union For group by clauses, it also looks similar. So here it's important to note that We need to fetch all columns that are denoted here in the group by clause Because also here we use the set union which filters out duplicate values Now we come to the more difficult part or also more interesting one. How do we test aggregate functions? There our test oracles are specific to the respective aggregate function that we want to test So probably one of the simplest aggregate functions to test is Are the max and min functions for these First of all the original and partitioning queries look perhaps as you would expect But here for the composition operator, I want to point out that before like in for the previous test oracles the Partition corresponds to a subset of the result And here it is an intermediate result So for the maximum value, it's basically here the maximum value that we computed for the rows And in order to derive the overall maximum value, we need to apply another Another max function Now since it gets a little bit more complex I also want to show you a concrete example the bug that we found in cockroach tp You can see that this required actually an experiment to vectorize a flag and also interleave tables And here we sent the original query to cockroach tp and it returned a single row with the value null And then we derived this partitioning queries from it So you can see the ternary-breticat variants again here Each of them computes the maximum value And then we need this another Outer application of the max function to derive the overall maximum value And this allowed us to detect this bug here in cockroach tp so like For that particular example out of curiosity like I mean you're using can you go back to last slide? Sorry Like you're using their vectorize engine, but then you're also using Like physical denormalization because you're interleaving into the parent table like That so like who like who told you to turn those things on right? Is that something you think automatically generates or is that Like that's that's like that for the custom code at the right for cockroach to say when you're going to create a table Here's different variations you can do. Yeah, that's that's a very good question. So In general the database a generator and also the query generator They are specific to each database management system that you want to test typically So basically we have at the beginning before we apply our test or because we randomly generate a database So basically some random first create table statements and then some inserts and also all the other Potential options that that are applicable in this context And only then we basically apply this this test or And all of these are specific to the database management system that we want to test And so in all the examples you showed you've shown so far There's been one column in each table like does this also work for multiple columns? Does that actually matter for what you're trying to do? Okay, that's also a good question. So It supports like we support the arbitrary number of columns But for to reproduce most test cases we We only need one It's also for actually for the majority of test cases. We also only need one table that we found So I think in in our first work. We actually analyzed this closer, but But I don't remember the exact numbers from from them Okay, and And basically also before we brought it back. We try to minimize them or reduce them as far as possible So there are some automatic tools to do this. For example, c reduce which Which has c c++ specific Passes, but it also works quite reasonably well for for c code Hi, can I ask a question here? So how many queries you need to generating for testing say croc range db to find these bugs? Have you like count like how many test cases you're generating? Yeah, that's that's also a good question. So for our evaluations, we always Develop the database management system at the same time also applied it And we did this typically over a longer period of let's say three to five months Like for for all the database management system that we tested So we cannot give you any Like statistics on this But from a personal experience is that actually after we implemented Let's say a new operator and we let sequel answer run for Let's say one minute then typically most of the bugs would already be found like within this one minute So depending on the database management system A sequel answer generates let's say about 10,000 queries a second And it seems that many of the bugs can already be detected very quickly Thank you Thank you. So just a quick follow-up on the on the question on supporting different systems So I wonder so you just said that For a different system the system developer would provide the database generator and the query generator, right? Then I'm wondering for the rules Do you have to re-implement all your rules for each system? Or can we can somehow reuse the same set of rules? But like who those rules those implementations Two different systems through like some layer in between That's that's a good question. I mean um, I think that's uh more implementation detail, but um, currently I think sequel answer is not Implemented that's a clean but actually since I have here the source code. I might as well. I'm just um I'll shortly explain this or demonstrate it. Sure. So here's for example So here's for example the Implementation of the world class oracle. You can see here that we have a common Partitioning base where here we generate for example a random select statement So here you can see that we generate a random like joins from clauses and so on And Here we're in the class again that implements the query partitioning where oracle Where we basically first set the work clause with the non-negative version the negated version of the is now version and then we get the results that is it's actually Quite reasonable to implement. It doesn't require too much effort and maybe also very quickly Um, the core component for generating random queries is random expression generation And also there we have quite a naive approach implemented. So we just select some of the applicable options for example some Unary prefix operation and so on And there we just select any of the applicable options. For example, not operate a plus a minus and so on and then we At the end have a visitor that it derives a string. So it's actually I would say reasonable from the terms of implementation effort Right, right, right, but but also I I saw this this this said duck db some prefix operators That means that for another db like cockroach db this thing would be really really implemented, right? Right, right exactly. So I guess it would make sense to try to factor this out, but um Yeah, it's it's we we haven't done this yet and it's every database management system Provides like unique operators unique options and so on which is why they are basically mostly separate Right, right, right. Definitely but it also means it like How do I say this um Like the this man because it's true in anything the scope. It's not like the fuzzer or the fuzzer just Permutes some random stuff and then Eventually you'll find something your case like if I don't write a code to cover You know part of the system then like your thing never never never explores it As you sort of want both and you want the fuzzer plus this thing Or take the schema that the fuzzer generates and somehow extract out the You know these different constructs of the operators Or just as you said like have a common denominator, you know Lowest common denominator of the sico standard across multiple systems and then build the customer stuff on top of it Yeah, so can I have like one very very very very very quick question here. Sorry. So like how how how did you randomly generating the data? Like the data Yeah, so basically the insert statements. They are also randomly generated So when you say like So when you say like each second you're generating about like 10 000 queries Are these like curious for one instance of database or for each query? You kind of like have their own instance or like what's the rate ratio here like Like number of queries compared to number of instance you generated Right. That's a good question. So um What I was referring to is when we already have a database generated So generating a database takes a little bit more time than generating the queries So typically they provide heuristics where the user can select how many Queers should be generated for each database management system So typically I think we would generate around 10 000 queries for each database that is generated or a little bit more maybe Okay, so so sorry for for for this but like so because I saw all these examples It seems like you only need like a one or two or three Or zero to both in your in any instance Have you ever find a bug that requires like a large number of tubos to trigger that bug? Yeah, that's that's a good question. So the examples that I've shown you they have been automatically reduced And uh, it's basically the original test case. We found them with more complicated like Data and also more complicated queries So they are basically reduced to the minimum to reproduce their respective back Thank you. So so if time allows actually We will see later if I can show you a demo or so to to show how how typical queries and data looks in SQL answer All right, keep going. This is good Okay, thanks so the um first, um Aggregate function that I showed you how to test namely max was a so-called self decomposer by aggregate function so I don't want to Explain now the formal underpinnings of this But I want to tell you that there is also another category of decomposable aggregate functions And this is best explained on an example so You can basically the intuition here is that if you want to test for example the average aggregate function Then it's insufficient that we in the partition compute a single value Rather we need here to keep track of the sum and the count to then derive the overall average And that's basically also allows a testing decomposable aggregate functions like by including tuples When computing the partitions There's also another category of aggregate functions namely non decomposable ones Group concat is one example Which concatenates strings and they're actually the order is important We didn't consider testing them because it would be more complicated But also because um, we would expect that few optimizations are applicable to them due to the same reasons as We have difficulties testing them. I think that's only my sego too. That's on the sego standard Ah, that that's possibly right. Yeah Yeah, so overall we evaluated this approach in I think In three months of implementing and running our tool and reporting and reducing bugs In this time, we have found our close to 200 bucks or 23 of which have been fixed um We evaluated the approach and on six database management systems actually only five here are shown because um For this latest approach, we actually didn't find any bug in in postgres Um, and you can also see here in c colite, which we comprehensively tested before we only found for new bugs What I also want to stress here is that the total number of bugs that we found reflect our testing efforts and not how like how Well, the database management system is tested or what what the quality of the database management system is So at this point, I also want to give a shout out to the bug db and the c colite developers Who have which have been especially responsive and typically fixed our bugs even before one day of reporting them So what test or across did we used to find these bugs? Well, first of all and there were oracle found most of them It's not only the simplest, but it seems also the most effective test oracle And what I also shortly want to mention here is that um in our relation We found that this oracle has some overlap with um one of the oracles that we had previously proposed For the other functions or for the other oracles we found fewer bugs, but still a number of interesting ones Um And here I also want to note that we implemented these oracles only for database management systems For which our bug finding efforts saturated using this ver test oracle And these are the ones highlighted in red because we found um that we often have problems filtering out duplicate bugs And also that for example having test oracle can find a number of bugs that also the ver oracle can find Simply because it also generates work losses Now I slowly want to start wrapping up. Um, I want to mention that uh TLB is a metamorphic testing approach, which has some implications namely a metamorphic testing approach in this context means that we have a SQL query that we execute and We get this result set and based on these two components we derive New SQL query that functions in the test case and also a test oracle that decides whether the test case executes as expected Now the fundamental limitation Of metamorphic testing approaches is that they cannot establish a ground truth So it might be that we cannot detect a number of bugs simply because the database management system Is consistent, but uh, but still Like buggy Also to shortly compare it with the other approaches that we proposed so All three approaches that we have proposed so far can test work losses, but only TLB is applicable to comprehensively test other features such as aggregate functions On the other hand, um pqs the first approach that we proposed is the only one that can establish a ground truth But this is also a reflecting Effort so for pqs the implementation effort is moderate But here for TLB as I also have demonstrated in the code. It's actually quite low and no work. It's even lower So for pqs and nipirac, it's the same thing like TLP where I would have to Write some custom Java code that generate the queries Exactly. Yes. So they're basically all based on the same table generator and also on the same query generator It's basically then just um configured to um to provide the test oracle that we define Okay, so if I if I did what you did for ductdb or take the cockroach one like Just those two class files would be enough to run all three tests Um, so actually it looks a little bit different for each um oracle that we want to support. So Um, okay, I mean you have to check me out, but I I understand. Okay. Cool. Keep going. Okay, sounds good. So, um I also want to shortly talk about the reception we have received so far so I would say the majority of database management systems that we tested we received quite positive feedback from the developers Especially the cockroach db folks have been very supportive and appreciative of our work So for example here a Jordan from cockroach labs even said that we are doing the database industry a great service Or here peter which um, which said that he is a fan of our metamorphic testing approaches As another example the ductdb developers they appreciate our testing efforts because ductdb is quite a recent database management systems And mark who also presented I think a couple of weeks ago Told us that he appreciates the work because otherwise users Would basically have a not as good as experience And also since we can provide more compact test cases to reduce the issues And as a last example for developer reception So pincap the database management system developing tidyp We actually participated in their bug finding challenge And in this bug finding challenge 22 of the bugs that we reported were classified sp1 bugs So the second highest severity level and six sp2 Which demonstrates that our bug Finding approach can find many severe bugs and actually when we redeem the points that we Obtained we can redeem them for I think one has 17 t-shirts. So did you get their t-shirts? Not yet, but let's see I can also redeem them for for who this so Maybe it won't be one has 17 t-shirts Then as Andy mentioned during the introduction Industry has started to adopt our approaches. So we have been contacted By quite a number of companies organizations and developers that I Already started implementing the approaches or which were interested in implementing them and of those pincap Already open sourced that tool which is called go see colanza and they support all free approaches in their system um Now last I also want to talk about bug importance So of course we found bugs of different severity levels. So also many Not so severe bugs, but also many severe bugs But I want to argue that even the less severe bugs matter and for this I have a concrete anecdote that I want to tell namely user of sqlite reported This following test case on the sqlite mailing list stating that this query here computed the incorrect result And another user confronted the original Bug reporter saying that you cannot do this world-class in production code. I mean you can see it's it's quite convoluted And the user defended themselves stating that The query was generated by some middleware Yeah, so automatically derived not user specified And we actually found the same underlying bug We reported to the sqlite developers. They quickly fixed it So at the time when the user reported this bug, it was actually already fixed on the latest development version And I believe that this shows that even bugs that might be considered as obscure They might affect some users, especially when queries are Generated by some co-generators Um, I guess I'm already over time or Five minutes for a demo Go for it Okay, sounds good. So basically I'm going out to the sqlite Sorry to the sqlite home page. I'm cloning the repository It should take one second Building should be quite trivial We only see the the powerpoint. I only see the powerpoint I'm just stopping the share and Can you see it at all? Yes, great That really can pick us. So we're now in the directory of sqlite answer What we only need to do now is we use a maven to build the project Which should only take also a couple of seconds here You can see that all the test cases execute correctly because we don't have any at the moment But hopefully soon And then basically we already have this char here that is generated And then we can directly launch it using Java And if you use sqlite answer without any option it will first print all the supported database management systems So here for example ductdp and also the different options that are supported And then basically we can select that we want to test ductdp And now every five seconds also some progress information should be output Namely, how many queries that we generate and also how many databases What we can also say that's saying that 50 percent of the queries are failing Successful statements 50 successful statements 50 percent. So that means oh, yeah, yes, okay So basically we have a very naive Database and career generation process. So most of the statements fail or or half as shown here What we can also quickly look at is the logs that sqlite answer produces Yeah, we're just using to you know And basically what we can see here is that first some tables are generated then some random actions are Selected. So here you can see a couple of insert statements Actually quite a number of insert statements And if we go a little bit further a bit Low you can see the Queries that are generated. So here's a original query actually quite similar to the mycicle example that demonstrated And then the partitioning queries here where we have the free ternary predicate variance Well, I guess that's um That's um The overview of how yeah, but how do I know which one failed that you have your log file has whatever? Oh, yeah, so now none of them failed because um, we cannot find any more bugs on the latest version of duck tp But um, if it detects a bug it will still fail with an error message and also explain what uh, what went wrong But sorry in your in your debug output when you were running it it was saying successful statements 50 percent Uh, so so that's that's basically it sends a query to the database management system But it might be semantically incorrect So, okay, okay. Yeah, correct ones, but but not semantically or not necessarily got it. Okay Okay, thanks. Okay, cool. Awesome. All right. Um, so again, let's we're we're virtual so we can't applause, but I'll You know, I'll I'll applause for everyone else. Uh, we have just two minutes left. Does anybody have any questions for, uh, manual Again, I'll meet yourself and say you are and we Hi, I'm I'm steven frost. Uh, hopefully you guys can hear me. I'm with Credit us. I think you mentioned earlier. Sorry. I was in the middle of something else when you asked, but um, I would be curious if you Um, manual have looked at either sql smith or if maybe you've looked at deppson Um, those are two different tools that uh get used pretty heavily to uh to test out postgres I'm also really happy to hear that postgres didn't have any bugs in your last round Right. So um sql smith, I guess is the most widely used tool for for, um, fasting and finding Crash bugs basically, but I guess our goals here are different since we want to find a logic box similar also with with jebson But I agree that these are very widely used and very useful tools as well Well jebson is interesting because it's actually like there are things you can do in terms of serializability of transactions and ensuring that the database is performed properly um, so I I wouldn't agree that it's necessarily just a fuzzer it it does go after some Some issues like that, uh, which I would be interested to hear if you thought about sql smith is a fuzzer jebson is more targeted, but but like right jebson is harder to set up Actually, steven. Have you guys tried the new one ellie? Um Don't think I've played with ellie. Well, actually I think I mean I'm trying to remember if that was one of the ones that was mentioned here because we were Yes, so jebson slash ellie. I mean is that something that one of my co-workers, uh, peter gagan was just working through Um with a kyle kinsbury, you're working with jebson slash ellie to work through a um, uh, a serialization isolation Serializable isolation issue in postgres. Yes, I'd love to see that run against other database platforms Yes, okay. All right, cool. Awesome. Thank you Panas, I think you are roger. You have a question I actually I will say this is panas chisances. You're a software great, uh, presentation very interesting tool Absolutely, actually, I was triggered more by one statement that you said about When you the other method when you compare three different databases Of their output and then you said, uh, possibly this tool would not work because of Sematic inconsistency in the language You haven't thought about using your tool in fact to recognize such inconsistencies And uh, uh compare The correctness quote-unquote in semantics of the different languages SQL languages were flavors I think it would be interesting but the the thing is also that database management systems deliberately implement, uh Inconsistent semantics. So it's not that every database management system wants to closely follow the the the SQL specification And also not everything is specified in SQL standards I think even created index essence on it. It's it's not um part of the specification if if I'm not incorrect So I guess it would be Or did I understand your question? Yeah, partially. Yes, but I was curious whether actually it's a good starting point to use that So everyone says like gdbc says that they offers you some common language so that you can collect and interpret A lot of back end databases How can you actually truly? Recognize that when you are interpreting you get the exact correct semantics Or what would be the least common denominator? That would be interesting to see You know using these methods Oh, it's just a thought that uh Occurred to me when you are talking is not a it's a suggestion rather than a Oh, yeah, that's That's that's very interesting. So basically to extract the common core between all the SQL dialects That might be unseen 92 or so I guess that's some some minimal one But it would be interesting to actually check if If uh, all of the database management system at least support this SQL dialect That's where did they divert it at some point in 90? So that will be interesting to see Okay, all right, Raju your quick question Yeah, I wanted to know like Who you are, where are you coming from? Oh, sorry. I'm Raju. I'm from uh couch base So like I'm interested in trying to find out how we can use this tool for uh our no SQL So and uh, you know, that's the main thing that I am interested in. So is there any next steps that you can suggest? um So we already used it from for one time series uh base database management system and also found bugs using it um, I guess it would be very interesting to also investigate whether we can find bugs in documented or document oriented database management systems like mongo to be also um, I think there will be a next is interesting step But I guess at the moment, um We we will still stick to To dive a little bit deeper into relational database version systems Okay, but thanks for the suggestion You when you said document databases, you said mongo to be he's from couch base. So You know, that's uh, sorry. I didn't hear Okay Okay, guys, uh, this has been awesome. This is uh, again This is this is a super exciting topic and I'm glad to see that everyone is recognizing Uh, you know what your tool is capable of doing. So I'm I'm excited to see how this grows and gets wider adoption Um, so again, thank you for manual for sitting up late with us. I realize it's 11 30 for you at night We really appreciate this