 Thomas and I'm working in the team where we process that data where we make that available for the machine learning and where we therefore build these API's and That's the main topic of my talk. It's mainly three things you can see here in the title It's building a mighty purpose platform meaning we want to use it For different business domains, I will explain that later on we have bulk data So not just a few data sets, but lots of data which need to be Processed in a fast way and we do it using sequel alchemy. These are the three points that you will see throughout the presentation Yeah, introducing a way of building data processing applications that can be used in many business domains. That's the topic Yeah sequel alchemy We are building on sequel alchemy. I mean most of you I assume know that it's just a statement from the website Which are copied here and have to say what I really like about sequel alchemy is that it gives us so much flexibility in how to build your application you really have Both ends of the spectrum doing it much database oriented for high performance and you can you also have the OAM where you can program in a much more abstract way and within our application we use Both of these flavors depending on In what area we are and what we do exactly in the API processing part. I will show here So let's build a multi-domain platform. What do we have to do? What do our customers expect from us? Well, we have to load the bulk data wire a CSV file This is what I will show in my example or in real life. We also use XML files We get that into the system wire Why HDP interface where post requests? We have to verify that data We have had several talks already regarding clean big data, which is not always clean which can be quite messy and So that our machine learning can work on that We need to have clean data that has well a data preferences And that's one important part of our application what we do Yeah, and we use it for different business domain So what we currently do in our company is for retail for tourism and for other areas but Yeah, what I will show in the demo I will explain on the next slide then So there's still a lot of technical to do what we have to do We have to create a database schema based on the business domain. We are in We have to parse the CSV. We have to save that parse CSV data to the database We have to validate that data validations. There can be multiple things We have to check that the required fields are filled We have to check that the data is correct and that in a date field. There is no time for example or No other descriptions like today or tomorrow And we have to check that the references between the data records are correct We want to give the feedback. We want to give feedback to the customer about the processing processing status of office data Whether it was accepted whether we were able to process it what is done with it And it is important for us that we can separate the data that we received from the customer from the clean and validated Data that we will use for machine learning So we want to be always be able to track what was sent to us and what we made from that Having thought about that Let's have a look at our first customer our first customer is a pup and What could a pup want from a machine learning algorithm? It wants to predict how many drinks are sold in the next evenings So that they can plan accordingly how much to buy how much waiters to have So to do that they want to send us the drinks. They have available They want to send us how many drinks were ordered per evening for the last half year And how many visit us they had on each evening so that we can do our learning on that How could the data model for this look like it's quite simple We have on the one hand the drinks the orders reference the drinks and The visitors. That's just another table that we have for information. I Told you that we need to separate the data We got from the customer to the data that we validated to do that We have two sets of Tables on the one hand. These are the stage tables These are the data we got from the customer just as we get it from the customer So maybe he sent several updates then we have several lines in them Maybe there are some duplicates because he sent the same file twice Then we have several lines in there and maybe there are some errors in then we also have them all in that stage What you want to get out of that process is the core and in the core We also have the drinks to order center with the task, but there we have one Unique ID for each data record and when we have updates to the data. We will update that data record and not Saving at several times. So the machine learning algorithms can use that and can be confident that they will get sensible data How could such a CSV delivery look like? Well, let's take a simple pub. We have some beer We have some additional information here alcoholic content. We have whiskey Let's see. Maybe the pubs in Scotland. We serve Scottish whiskey without an E and We have some coke for the people not wanting some beer or whiskey and we have these orgas On that day, we sold ten beer and eight coke on the 11th of July 15 beer and two whiskey And on 12th of July we sold 13 beer and one. Yeah. Well, we got a new waiter from Ireland There they write that whiskey with an E, but that would be bad for us as we only know the Scottish whiskey But things can happen. We get that at the delivery Now what do we want to do with that? Ideally what our code should be able to do it should find references between objects So we have that stage table here These are the orders we get and you can see that's the external code you saw that's the drinks you saw that's the count You saw on the orders table and then there's this new column the drinks reference This is nothing the customer sent to us This is the reference to the unique IDs of that drinks. We want to find they are available here in the car You can see a lot core table. You see Here we have a unique ID and we want to write that in there Yes, one implementation detail. I missed at the last slide Here we don't have a foreign key relationship between this column and this column You can be anything in so at the moment, it's empty, but in the core we defined A foreign key relationship between that table and that the table so that also the database ensures that really there's sensible data in here So whenever we want to copy that data To that table. We really need to make sure that we find the correct references first This we do in two steps. We have the reference finding step Which writes them in here and then when they are in it writes the validated data to the core and copies And then you can see it omits just this information But keeps the reference information with the foreign key to the drinks table And you also see the last line is omitted this whiskey. He could not process. He did not copy it in there We have to decide in our application whether we throw an exception Then whether to write some log file to give information to the customer in some way But at least it should not come into the core So our task as you all of us is how can we write the code that? Does these steps? How do we do that? We have several possibilities. We have plain SQL Works fine, and if we want to start playing around with that That's always a good choice on just playing around with the database to be able that there's really a sensible way of doing that We can do that in the core so sequel or can be core model Which is closely resembles the sequel alchemy and where we have here orders stage This is a sequel alchemy metadata object which contains the information about the stage table for the orders We issue an update statement and we say what are the values the warriors are that the strings reference column should Be filled by a select and you want to select the IDs of the core table of the drinks where the external code of that core table equals the Trink's name of the order stage and Let me just go back to show it to you again here We want to make sure really that this ID gets into that column and that for exactly Where these this name of drinks matches here the external code of the core so therefore, okay This works fine. That's a nice idea, and I would say maybe would be the best for implementation We have slightly in our in the back of our head that we might get different customers with different models And we are thinking about well Maybe it would be a good idea to look in the ORM so that we are more flexible there We have here the the tables as objects and we have each row as an object And it's much nicer to implement the stuff here. We can loop over the orders We can query we can query the table with the current filters and update the table It works fine also But as we do here the single database access that might be not a good idea from a performance point of view when We really have big customers, but these are the things the tools we have at hand at the moment So let's assume in our team We used that statement, and we are really happy everything works fine. We have great data Customers happy our data scientists are happy everything's good Now Well, as you can see it good or bad The customer is happy the the pub and she tells the brewery about that So they are talking when they're getting new delivery and the brewery is quite excited Because they say well that machine learning stuff where you read about that in the newspapers We are thinking about our brewery. We have there's some machines We have the boilers and the fermenters and we have some sensors in there we we measure some stuff like temperature and pressure and Then must be some way to to find out I mean pruing and storing beer is quite a long process We want to know in the beginning what will be the quality of our beer in the end couldn't you help us with that and Our data scientists are quite happy with that interesting new task and we just need to get the data into the system This can't be that complicated well Looking at that statement here It might be we have to rewrite all that because there are different References between the between the categories We now have machines. We now have sensors. We now have measurements all are named differently and To make that customer happy we would have to rewrite that complete statement So it would work But when we look into the future and maybe they are more interested more interest more interesting business domains Then we might have really lots of work to do so what could be the solution we thought now a team and We said Well, we could describe these things in a more abstract way We can say we have one business domain Which is the pub at first and the pub consists of categories. We have the drinks. We have the orders and we have the visitors They consist of the elements Well, that's the external code. That's the reference to the drinks. They have some types we need for the database and What is some of these elements are special we looked at that reference finding task and we have seen They need special processing and it would be good if you just could Have a way to determine that these are special elements that we can Inherit here from that element and we also have I mean each element has a name Which we see in the CSV file and it has a name on the database Which is well most cases to see is the name in capital letters But for this order strings reference you had seen there's an additional field There's this name reference. You remember in our reference finding step We wanted to fill that column in the stage table So we add the serial in our subclass and we say that this belongs to the category So what does that help us if we can do that? I mean we also thought that Somehow resembles a sequel alchemy model also. I mean a sequel alchemy model also has some tables It has some elements. It has some types It has foreign keys, but the sequel alchemy model is for a database description It's not so much for an algorithmic processing of that stuff So therefore I will discuss that at the end. We said it really makes sense to have this in a more abstract way How now does that look like? I mean sequel alchemy also has parsers here in ORM for example, we have that generic ORM parser and We have here in the sequel alchemy model We have our business domain and we have the business domain that is also described in the application So in both here, we need to have that business domain What we wanted to do we wanted to factor out some knowledge of that So that this application does not need to know the business domain that we really can set this here in that domain model and That we can have specific task renderers So for reference finding we have here one renderer which uses that information to generate sequel statements And we can have one for other tasks also How does it look like I? Will give you here a code sample for that pop So we have here the domain the category the elements from our lingo package We have that pop which is a domain and that pop consists of three categories the drinks the visitors and the orders That's quite easy to write down. I mean there. There's nothing more than you need and it's easily understandable The interesting stuff is here that the reference which you see links directly to the drinks Having that how does that task specific renderer look like? So this is python code which at first checks for each category What are the references in there and we can just check that within his instance and We loop over the references. We have here just one, but there might be other categories where we have multiple ones We are getting the stage and core tables from the sequel alchemy meta data As you can see here, we can find that by the name of the stage table We can find the reference core table and we issue an update statement So this update statement here. You see update the stage with what values these values are Constructed dynamically so because we cannot give here the keyword arguments dynamically We construct the dict at first and this is here the update tick with the keyword values and you see here That's the column which should be updated and this is a sequel alchemy core statement on how to update that with what name So we print that I will show in the demo and then we can execute that So now let's switch To that we can see that this really works What I prepared in the demo is a simple as to let like database And I have prepared some script No It's quite do that way. Okay, great So at the moment we do not have to SQLite database. You see here these tables do not exist at all and What we want to do we want to create them we do that By calling our Python script create database and These tables are there you can see here by the way We have a configuration file where we see what's the database and what is our business domain? So now we need to get data in there So we call that Python script We do the CSV import We do that at first for the trinks category And you see here that's the sequel statement created from that and it's here in the category So now we do that for the others also and you see it's in here Now we want to do the reference finding and for sure before we find that references the order the drinks need to get into the core Otherwise, it doesn't make sense So what do we do? We are calling the core load script and we're doing that at first for the drinks You see here that is the generated sequel statement and this is the core And you see that that drinks ref column is empty So now for the most interesting You can see these as per statements are issued and here you see that this is filled and also that the orders core is filled Well, it does not fit completely on the screen, but you can see that it's all in there. So that works fine That's great But that was a step. We were already five slides ago. Thank you Now let's go back And let's say we have from now another Domain we have a brewery and that brewery we say it has machines. It has sensors and it has measurements And that brewery here you can see it refer the sensors referenced in machines and the measurements Reference the sensors. I would like to show that in the demo also, but unfortunately the time is not sufficient for that But it's really as simple as we have seen it here You just need to change the conflict Jason you create the database and the new tables will be there And you can import the differences we find this will be in there also So what does that mean when we say What does this domain model help us? Well, it is optimized for a high throughput as we you see these SQL statements that are issued They can be processed directly on the database So we put all things in the database into the stage and now we have some processing of our domain Knowledge and we'll generate the SQL statements and the things will be processed on the database That's a really good fit for analytical models when I When I thought about the demo here for the talk I found it might not that fit that well for transactional models where you have more complex end-to-end relations But for analytical models, this is really great and helps a lot When I compare that to a sequel alchemy model, which is also some kind of meta model Then we see that the sequel alchemy is focused on a database description the domain model in contrast can contain more information in our Team we had also the task that we have That we have time-dependent stuff So some drinks are only available at several days or maybe they were available last week But they're not available this week and we need to check these cross-time dependencies also This can be done in the domain model Also, we can note that there and we can generate the sequel alchemy model out of that domain model So in that case we have both What is an additional bonus is we can use that domain model for much further stuff You can see here we can generate the sequel alchemy model We can generate a sequel for our tasks We have to CSV loader configuration But also what we do we generate documentation out of that so how to fill that CSV tables and We can generate demo data and much more stuff, but that's just to show you some ideas so that you are able also to have some questions I will close here and That's what I wanted to show you are there any questions is a lingo library open source and available No, this is something we developed internally. I I mean what I did here for that talk I prepared a small demo application and I also thought about providing that but I've seen it it takes much stuff around to make that example somehow sensible and Yeah before making that open source We I would have to check also internally in our company and if before going into that direction I just want to see see whether there's interest at all and that so if you have some questions to that or want to Get some further updates which I can just talk after the talk any more questions. No, all right. Thank you