 Hi, welcome to my talk about ValidateDB, it's about validating observations stored in the database. So the virtual take home message of this talk is that you can with ValidateDB you can validate your database records before any statistical analysis, you can build rules with Validate, it's not our package and execute them with ValidateDB and ValidateDB is built on dpi or should work on any dpi supported database. As we all know data cleaning is a very important step in our work and one of the first things you should do in data cleaning is checking if your data is valid. So see for example our book we wrote. So for example it's wise to check if age is non-negative or for example that children do not have an income. Validation not only helps you to check the quality of your data but also communicate about it and to make your domain knowledge explicit and reuse it over and over. So checks can be repeated. So with Validate and not to be confused with ValidateDB you can specify a domain specific language or you can specify data checks and each validation rule is an R expression resulting in an illogical. Data can be checked and you can do all kinds of statistics on these errors you find. These validation rules are also used by other packages to do imputation or data correction of finding other errors. So in the ValidateDB you can specify a validator object in this case just in source code but can also be externally in a database or these rules can also be specified in a database or in a YAML file with extra metadata describing the intent of each rule and this checks if age is non-negative and if there is an income. However ValidateDB works on data frames. If you have large tables this can be a problem so often a common solution is to use databases to store the data and to do aggregations or selections on the data just before analysis but you still need to do some quality control on the data. So ValidateDB can be used just like ValidateDB but on a database table and instead of writing SQL statements to check your data you just write our statements the same statements as ValidateDB. The checks are lazy and can be stored in a database or in a sparse format. So let's look at an example. Suppose we have this dataset with a child of 12 years old earning 5000 euros or dollars which is just in a database table as we can see in the print screen. If we define a ValidateDB set a rule set saying that if there is a salary age should be over 15 we can see in the end of the screen that the first record doesn't fails in this because it's a child of 12 years earning salary. So this works just like ValidateDB but if you look at the object itself you can see it's on a SQL object and each check or rule is a column in the database or a lazy column it's a lazy query as you can see. So each column in this lazy query is results in true false or na for each rule. The rules are ultimately translated into SQL as you can see so we have this first dataset so working income mean age and these are translated into SQL statements as you can see over there. There's also sparse representations so validation of the rules normally result in for each record a record describing if it fails or not fails a validation check but if you do it in sparse format only the failed ones or the missing ones are stored. So thank you for your attention if you have any questions just mail me and I'm curious to install the package ValidateDB. Thank you for your attention.