 Hi everyone, so it's nearly time to start, it's time to start. So I'm going to talk about Dunker and if you don't want to listen to me and follow at your own rhythm, you can download the presentation at this time. So a bit of presentation, first everybody hear me correctly, it's alright, or should I speak louder? Like that, oh good. So I do Python development since 2006 and a few also SQL, WebDFC, general IT development. I'm also a long time Triton contributor, so it's an ERP framework in Python, and I spend so much time on it that it probably skews me a bit on how I see development and how database would work and etc. And currently I'm an admin employee, I'm mainly consultant for NG which is an energy provider in both France and Belgium mainly and admin is hiring, so we are searching for Python developers, so if you look for a job you can ask us. So I will start with a bit of context about Dunker to give an idea of why it was created and then I will show some examples of how it works and tell some details about how it works and how to use it. And then a few slides about the future of Dunker, so mainly what I still don't like about it, what I would like to implement and some ideas. So as I said we are consulting mainly for NG and we work with a lot of external data, so we are downloading data from web services, we are downloading CSV on the web or we are scrapping websites or FTP files, it's like that. Those data are most of the time heavily denormalized, so completely the opposite of what you want in a database. We have most of the time no explicit ID to identify the record, so we don't know if it's the first time we see a value or if it's an update of an existing value. And sometimes you have an ID but it can be unreliable or sometimes you don't even know what can be interpreted as an ID. And as I said we are scrapping websites and CSV Excel extra. And when it's not external data, when it's data maintained inside the company, everybody loves to work with Excel or everybody at least is a non-developers. And it's the same issues, we don't have any IDs to work with in Excel. We could add some but it creates other issues and generally we have no metadata for that. And more about the context, so this is more about how we get the data and what we do with those data. We have a few amount of business data or concept that has not much data with a lot of logic about it. And so as I said, for example, we manage data about power plants, fuels, shipments and we maintain a lot of time series, so like weather forecast, plant production and stuff like that. So it's mainly time series, we don't care about one line and alone we care about a lot of data to compute averages, some stuff like that. So we are far away from an ERP or a blog application or stuff like that. So the first question would be, we have our RMS in Python, so we could use them to write those data and databases. But as I said, we work with a lot of time series and stuff like that, so we don't need to instantiate an object for each timestamp, it's not really useful. But we need other type of features than what the traditional RMS offers and I like and you should like the relational model and you don't always need object in Python. You can work with kind of relational data also in Python. So as I said, we have no ideas to identify records, but when you read all those records, you can identify them. When you have a weather forecast, you have a timestamp, you have a region where this weather forecast is done and temperature, so there is a way to identify stuff, there is a way to know if a data is new or if it's an update or not. So RMS is a technical support to do that. So the idea is, okay, let's enforce this way of working, let's work with a natural key or a business key for everything and let's build on that. So I will do the main example with the same set of data. So let's say we have incoming data, looking like that, it can be a CSV file or coming from an Excel file and so as you can see, we have different people that are members of different teams in different countries and each of them has a code which can act like a reference or something like that. So it's a bit tricky because as you can see, we have different teams with the same name in different countries, so we'll see how it works with that. So in Tanker, how do you define your table? In the example, I show it with a YAML definition, but it could work with a simple Python dictionary. So I hope you know how YAML works, but if not, it's not difficult. So this will become a list in Python. So we have a list and each list is a dictionary and each dictionary defines the table we want to define, the column of the table and an index. So how we don't define something in this table. So here, this means we're going to create a table called team with a name, a country which reference, a country column which reference a country table and that will use the ID column of the country table. So it's a foreign key. Table country, it's even simpler. We only have a name column and of course we index on that. And so all we have to do is define the member table which contains the name of the member, the code of the member and to which team this member is linked. And in this situation, we use the code itself as a index. So now we have defined the loop of the database. And all we have to do is to say which database we want to connect to. So it can be either SQLI or Postgres, the both back end supported by Tanker. And so here it's a dictionary saying which URI we want to connect to for the database and then the schema. So this YAML is the YAML we saw earlier. And then with that, we can use the connect method which is context manager and with that we can create the table. The table definition comes from the config and if a table does not exist, we created and then we add the columns. And something that can be tricky sometimes, you don't have to specify your table in a specific order. In the example, I create first team that reference a contrary, in the example, I define first the team table that reference a contrary table, but it's not an issue. But there is no support for migration. So if you change the type of a column, it won't work. But if you add a column and rerun the create table, it will work. So how do we put content in the database now with Tanker? So I show the countries here as a simple list of tables. So it's your job to extract this information from your source. And then what we do is we instantiate a view object. So the first argument of the view object is a table we want to write to. And the second one is the list of columns we want to fill. So in this case, country CZ, name is the only column. So once the view is created, we call write and it's going to write the country for us and fill the name column. So quite easy. Same for teams. So it's really close. We create a view for the team table and we say we're going to map two columns now, the name column and then the country column. The country, as I told you, is a foreign key. And here we say we're going to use the name column of the country to identify our record. So Tanker knows, OK, this string is going to be the name of a country and I will maintain the foreign key in the team table for you. And so then we can write it like that. Yeah, I didn't explain it. So the view object is close to the only object you really need in Tanker. You can see it as a view, like a view table in the database, because you can use it to select which column you want to see. But here we also use it to write. So it's all you view your database and all you're going to write in the database. So after those two steps are done, we can, so in this case with the skillet, we can use the skillet command line to check what we have done. So we see we have our countries. The ID column was created for us with an auto increment, same for the team, and then oh, this is a wrong copy past. And the foreign key here is filled by the correct countries. So as you can see, we have two blue teams in two different countries. But our index is built on those two columns, so it's not an issue. So how it works exactly, because as I said, the goal is also to batch a lot of writes in one go. So in this case, I have three countries and three teams, so it's not a big deal. But the goal is to write a thousand of values in one go. And the first implementation was trying to write the baseline by line, and it was super slow. So it works, oh, it works. So when you call writes, the first thing we do is to create a temporary table. And this temporary table has no constraints, so there is nothing to check when we write on that. It's really a temporary place to put stuff. So we call copy from. So copy from is from the Postgres API. It allows you to kind of upload the CSV into Postgres in one go, and it's the fastest way to put stuff in Postgres. In SQLite, you can still write stuff like by line, because you have no cost to pay across the network, and you aren't the same process, so it's fast. So when this temporary table is filled, we join this temporary table with the actual one. So let's say we want to fill teams. I don't know what happens. So this join will tell us which line of the temporary table is new and which line already exists in the re-table. And so we insert the new line, and we update blindly all the other ones. So when you upload, when you do a write with something like that, you overwrite everything or at least all the lines. We don't try to say, oh, this line is completely equal, so we don't touch it now, we insert and update everything. And then when this is done, we drop the temporary table, and we are done. So it's a nice way to do the write in a batch with the temporary table, but still maintain all the features like foreign key resolution. So when you write, when you do a reference to page about France, we know which ID it is and stuff like that. Should I do something to stop that? I close it? Or I close it? Of course. Sorry. So now we have stuff in the database, so we can read it. So yeah, so still with the connect context manager. So as you can see, it's the same view definition. So it's still there, but it could be the same object if you want to keep the same object. So here we say we want the name colon and the country colon, but you give me the name of the country and not the ID. I don't care about it. And so the dot read gives us, so does execute the query and give us back the postgres cursor or the SQL cursor. So you can loop on that or you can do dot all and all will instantiate the list for you. And you are supposed to read back what you put on that. So a more complicated example now. So we have put the team and the countries in the database, and now we got to put the members. And if you know PONDA and PONDA data frame, Tonker also supports it. So the example before, it was with a list of two, but it can be also data frame that holds your content. So here we have all our members, their country, codes and teams. And so to write them what we give now, so we have to say to Tonker, so before that we were giving a list. So the position of the name of the colon was enough. So when we did that, when we did that, it was implicit that the first argument is the name and the second one is the country. So it's the position that gives you what is what. Here what we give is a dictionary. And when you give a dictionary, the Tonker will get the name here from the data frame, country of our own, et cetera. You can also give a list of dictionaries instead of a data frame, and it will work the same. So in this case, we say, OK, name is going to be the name of our members. The country here is the name of the country of the team of the members. So Tonker will have to join all those things to know which team and which country is this link to, and then the team itself. So for the first example, we're going to have Bob in Blue Team in Belgium. And in this case, it's a tricky run because as you remember, we have to know, we have to hear Tonker as to say, OK, this member is in this team. But it's not enough to say it's the Blue Team or the Red Team because we have several Blue Team in different countries. And we have the index on the team, which is on both, on the name and the country itself. So we have to give those two information. For Tonker to know which team exactly it is and discover the right ID for the database, you have to give the country. So it's a couple of the team name and the country name that gives the team exactly. But beside that, it's going to work. It's not more complicated. So once it's written, you can read back the results. Just before, I show you how to read a list. You can also, after read, call .df. And this will give you a data frame back. And so we get back a data frame with all the results. And now, if we check the database, we see that it was, I think, Bob and Trudy. So those two are in the Blue Team, but the first one is the Belgian team and the other one is the French team. So it's a nice to have. So when you read stuff from the database, you can read everything if you want. But most of the time, you want to filter stuff. And all the filter works are based on S expression. So if you know this, you already know that. The S of expression is symbolic expression. The main advantage of S expression is that it's super easy to parse. So it's super easy to have a small language to express the filters. So in this case, we want only team, we want only members whose team is in country Belgium. So we say we want to read the member table and we want the country of the member. And this filter says the country must be Belgium. So the first argument of read is a filter and when we read back from the table, we have the list of team. No. Yeah, I think the example is wrong. Yeah, it should be team.name here because we have team as a result and not countries. But anyway, this is how it works behind the back. So if you do this call, Tanker will generate that for you. So implicitly, each time you do a dot, you will generate a join in your SQL query. Yeah, and as I said, we are reading team name. I don't know why is that. So we select the name of the team for member and then we join on team and then we join on country. And then we said the country name should be Belgium. So it's quite easy. And you can hear it's a small example. We have three tables. So this is the most complex join we can do. But we have production server with 60 tables and sometimes the query is 10 joins. And Tanker will try to reuse some joins if possible and will avoid to do stupid stuff if you have a diamond queries. So if you have several joins that goes through the same table but through different paths. Anyway, so if you want the more complex query. So we see here we have also a nest expression. So I forgot to give some details. It's quite easy. All you have to know is that it's an operator left operand and right operand. So it's like team name equal blue. And so here we want blue team or a member code in zero two. And then we have the member with those. Those members. Yes. So blue team are those two and zero run at those two anyway. And here are the current operator. So I show or an equal in but you have not equal bigger, smaller, not is not. No, no, no, quite handy. I'm not sure I have everything but until now I haven't yet. So this is quite easy until now. But the big problem when you work is external data is that you have also to update it. You have to maintain them. So this is all new data. So let's say we have scrapped the website or downloaded something. So 3D has disappeared and Dan is now in a red team in Belgium. It was in Holland. And the code is more or less the same. So you have the same view that define or you map this data to the database. You write members but the new version of members. And by default, right, we'll only try to insert an update but won't try to delete. So purge here, you say to Tucker, okay, if a line is missing, delete it also. So in our example, 3D is missing. So the purge argument will say remove 3D from the database. If not only Dan is updated. But you can also say insert an update as argument. So by default they are true but you can say insert a girl falls. So I will write the data to my database but I want only the updates and you don't insert your stuff. Okay, and then when we read back stuff, 3D has disappeared and now Dan is in the red Belgium team. So quickly, other features that I haven't covered. So ACL access control list. You can define filters that are automatically enforced for you all the time. It's mainly if you have multi-user application and you want to say this user can update this table or this line in those tables but not in connection pooling. So it creates a lot of connection but Tucker maintained a pool so you don't overload the database. You can work with multi-trade and nested context. So if you want to connect it with two databases and transfer data from one to the other, it's going to work. This is not a feature but a limitation because we use temporary table. SQLite, as soon as you create a table, even a temporary one, create a commit for you. So when you work with SQLite, you're going to have transaction as soon as you start to write. If you do run writes, it's not a big issue but if you do several writes, a few writes can be committed for you and then you wonder what's happening. There's support for more which way to pass argument. You can also give default value in your database. So you can have a timestamp with a default value equal now and it's jamming itself. There is an explicit delete method on view. And a short roadmap. Currently, Frankey Resolution is done in Python. Before writing, we have to resolve Frankey in Python. So we have to know Belgium is ID 1, France is ID 2. So it can be an issue with big tables. Yeah, this is difficult to explain. Now, progress support absurd. So we may use, I think there is a way to use it and avoid temporary table, I think. So the absurd position is insert and on conflict update, which is quite handy. Maybe I could add more type of colon. So currently, we can create colon of Varsha, timestamp and all the other one. We may need more. There is no way to detect conflict. So if you read a bunch of data from database, somebody updates them meanwhile and then you want to write back. You're going to erase everything you did or you don't know. Furthermore, so currently we insert to update or we delete if something is missing. We may need to, I already see some application to trigger something else when a record is appear. And F filter when we write. So if you have a filter when you write, first you know you won't overwrite something you don't expect to write, but you won't also delete something you don't want to delete. Yeah. And that's all. And I give some link. So I use RST2 PDF for the presentation. It's an old one, but a good one. It's a nice library. And some links about stuff I talked. Voila. Any questions? No, currently there is no way to load the information from an existing database. But it could be done because I already do that. Because I have to know when I create a colon. I have to know if it's already there or if it's a new colon to create. So most of the code is there to do that. But, yeah. I haven't got the need yet. Yes. Okay. Yeah.