 Hello everyone, my name is Marcin Wolski and I'm very happy to meet with you this time remotely. I'm very excited to present to you the product of more than a year of development at Registify. There is a surprise gift for you at the end of this presentation and other than that the presentation will be splitting two parts. In the first part I will try to explain the main concepts behind Registify and how we approach the EDL problem at Registify and in the second part of the presentation I will walk you through a hands-on example of how to use Registify with our package that we also provided with. So let me start with the idea. How a typical data process look like? Normally we have multiple data sources and the EDL process perfectly would be designed against each individual data source and it's very time-consuming, it requires a lot of effort, a lot of bookkeeping and it's very prone to error. What we do at Registify, we came up with the solution that we have this intermediate step The idea behind this is that first of all we aggregate different data sources, so that's one thing. The second benefit is that we remember how does data communicate with our engine and how they are feed into the end-user data template. And because of this you need only to communicate with Registify and the Registify will remember that this will learn from this history of communication and will use it to improve the accuracy of working with different data sources. And really the process of working with Registify is splitting two sub-processes. In the first step you need to let the Registify engine understand what data structure and what data you are working with. In this respect it will try to distinguish between empty and non-empty dimensions. If it's empty it will understand that this is the place where you would like the data to appear and if it's non-empty it will try to learn what this data is and how we can work with the data you have already in your data template in the most effective way. In this respect we classify the data against six potential classes that they cover the majority of working examples we have in our repositories and the engine will try to propose one of five possible matching algorithms. In the second step you just need to let Registify feel the data if there are any adjustments to be done you can do it also in this step and you have control over everything that the engine is doing. As a quick data template example to better explain what we mean by a data template is a simple panel data structure with the year which is a four digit year representation a country dimension which is the English name of a country and the GDP here you see that it's empty it has missing values and the engine will implicitly understand that you would like the figures to appear in this dimension corresponding to the dimensions in other parts of your data template. Let's now turn to the Registify package. The access to Registify is available through three different channels it is Google Sheets, a Python model which is still in development and the R package and to use the R package access first of all you need to install the package from GitHub and plot it in your current R environment. Afterwards you need to register your account you can do it for free at Registify.com To make the package work in the R environment you need to first of all set all the API URL addresses for Registify endpoints and this is done using the dysfunction set curl. Here we explicitly show you that we allow the package to learn from our data this is to make sure that your experience with Registify is customized to your data preferences and it is enhanced by the AI algorithms working in the background at Registify. You can always turn this function off but then you will not have a customized experience when using the package and you need to use register function to load the token and email that you obtain when registering an account and it's very simple straightforward and that's basically it. Once you've done it, you can enjoy the full power of the package. There are two core functions of the package, these are Analyze and Fill and they really correspond to what I told you in the first part of this presentation how Registify approaches the ETL process. There are also two very useful support functions. This is the visualization tool and the Adjust function. Adjust function really allows you to change all the data elements or matching preferences that Registify should use in your data process. Time for a data example. Let's first create a data template. Here the idea will be very simple. We would like to create a vector of days which we will try to make look like very unusual. This is done on purpose here because we are hoping to give you an idea of what is the advantage of using the Registify engine. We will have the vector of days from the first until the 10th of June. We will put it in the first column, in the second column we will put the ITA which is the free letter ISO code for Italy and the third column will be missing. If we have a look at the data structure it looks like that. The third column as I told you is missing it has the header COVID. This COVID, this header title will be used as a keyword in the search engine to propose the best data repository that can contain data points that fit into your data template in the best way. Analyzing the data using the Analyze function we see a lot of details. These are to really comprehensively describe the structure of your data set but what I wanted to point your attention to is that the classification Registify classifies the three dimensions as time, geography and general so they really correspond to today's country and COVID. Moreover, for the COVID column the repository provider was proposed to be Registified that means that the repository is supported by the Registify engine and the table name is called COVID-19 CSE. This is the repository made available by the John Hopkins University. If you run on this structure, if you run the field command you will see that the numbers appeared in the missing places and they correspond to the number of COVID cases in Italy on given days. But to make sure that we understand the metadata and how the matching was done we can always visualize the results. Here there are two structures, the visualized function takes two structures structure X and structure Y, structure X is the structure of the data that the user provided to the engine structure Y it was the structure already embedded in the Registify engine and it corresponds to this COVID header of the empty column basically how it works, the engine took the COVID as a keyword it provided the COVID-19 CSE as the most representative data source for this keyword and it has the embedded structure of the COVID-19 CSE table as structure Y. What is relevant here are two things, first of all the colors correspond to the dimensions being matched here days was matched with time dimension and the country was matched with country origin dimension but you see that the structure Y it has more dimensions and some of them were not used for matching. For these dimensions Registify proposed the default values so in this case we take the total number of confirmed cases. If for some reason we would like to adjust it for instance taking the number of recovered cases instead of number of confirmed cases we can adjust what we can communicate with the Registify engine using this adjust function. Adjust function the syntax takes three elements the first one is what we would like to change the second one is which column, which dimension we would like to change and the third one is what we would like to change in this spec we would like to change the concept indicator from confirmed cases to recovered cases and if we feed this adjusted data structure into the fill function we see that Registify right now it changed the values the values now correspond to the number of recovered cases from the virus in Italy in given, on given days and that's that's it I hope you enjoyed if you would like to learn more about Registify you can you are invited to visit our website the details about the R package are present in the manual that is provided with the R package itself but it's also described comprehensively with a lot of examples and a lot of explanations on our website the list of available repositories is you can find on Registify.com slash repos and for those of you who would like to enjoy the hands-on experience with our package we have a gift three month premium access upon registration on our website you can type in the voucher code it's case sensitive and you will be given a three month premium access to our engine thank you very much