 Thank you Morning everyone now. Thanks for coming up had a bit technical issues to start the day But it was all while you weren't here. So nothing happened. It's all going great so far So party like it's ANSI 99 but first I Used to work at an investment bank. I'm sorry and But it wasn't this type of stuff It was closer to that type of stuff But specifically and exactly, you know, every once in a while my manager was out so I could You know relax a bit Hi, I'm hus and the CTO and co-founder of a startup called 9th in we're about 18 months old And we use a computer vision and machine learning to automate the extraction of financial data from documents Before that as I mentioned worked in investment banking in the technology department in the equity trading team So doing all kinds of stuff with equities trading technology bespoke trader tools and risk and control systems and Background before that was aeronautical engineering. So a lot of stuff about airplanes There's a the slides should be up. I guess once we're done with the day But that's where you can find them on my speaker deck page, but when I hear talk about any of that today Let's talk about ANSI 99 party, this is This is going to be fun. He goes always fun for me. I hope it's fun for you And he stands for the American National Standards Institute It's a bit of a dated term because now they go by the term a sequel colon a year So we're currently on sequel 2016. That's the current live standard of her language But 99 makes a bit of pun and I like puns. So brace yourselves There was supposed to be two more puns in here But it was content didn't quite match up with what I was speaking about and that's fine because you probably don't want me up here Pontificating too much about about what I'll be talking about anyone catch that one This is you you only have yourselves to blame So let's talk about the hitter bases and Specifically relational database systems and they have a long long history So let's chart the timelines of three are probably the most popular The hitter bases in terms of installation base apart from one notable omission First, let's follow postgres QL or postgres started out as Berkeley ingress in high high in in 1970s a lot of actually trace their roots back to that university at that particular time and then through a series of quite regular updates, you know postgres has been Interatively improved upon from all that time up to postgres 11 got released in beta. I think last week my Microsoft sequel server actually started out life as Cybase and then due to a fork happening in about the early 90s it eventually found its way into becoming a product that's now associated with Microsoft and The massively used my SQL started off as msql in 99 is a in 90s and I'm highlighting there pep 248 DB API version 1.0 was First kind of released that was 1996 There was only a few years after Python the language first also got released so databases and Python go back a long long way But what's gonna help us understand even more. Let's do some software archaeology to kind of understand where we're coming from Big Co EE EE stands for enterprise edition siloed groups with Behave with behave with big databases. I see a lot of people Probably recognizing this image of architecture from large institutions technologies functions that kind of split For each system and the database is a really the system of record and the final destination for all the data That's produced by one particular team and one particular department and to get data across It's normally an extraction job or to shift stuff around the files being transferred And you ended up being in specific you ended up having a specific sequel developers that were doing sequel as their primary as their primary application language, so Programming stuff in the database using things like star procedures or you might have them call you might hear them be called functions And there was also this very special role known as the database administrator I'm not sure if they actually exist of their real or not But they do amazing things with hate they do amazing things with keeping these databases alive and healthy I sat next to one during one of my Graduate placements in my time when I when I was in the bank and that's probably why I learned a lot of stuff. I'm doing now So what did that lead to that led to an architecture called the fat client in which The UI application spoke directly to the database We had a new engineer join my team recently and it kind of struggled to kind of grasp the concept of like no The UI has an open database connection to the database there's nothing in between and Your roles and permissions are the databases of roles and permissions and if the database is down the applications down But that but that pattern wasn't just present in kind of these enterprise Hituations and even in the early days of the webs and even into like the mid to early 2000s things like Apache Servers were kind of doing everything all in one place. So you had your Apache Serving the web traffic then you had something like hot PHP and then that was talking to a My high-sequel instance it was all done on what on one on one place Then we kind of moved to what's the three-tier architecture and now the extension of it in mesh networks and Microservices and that's where we basically put something in between the end destination of the client and the data store there and They can be a server a cash or proxy or some some other client Typically the end points on an enterprise messaging bus if you're in enterprise or if you're in the web just HTTP HTTP brokered protocols and this decoupled Her data storage layer from that application lawyer from the application layer and that was really important because This then kind of allowed the two streams to kind of develop at their own pace and this led to the explosion in all these web Tools and technologies and you know, I'm obviously showing some of the more modern Python web frameworks But as Naomi mentioned downstairs, you know These Python has always been involved in web frameworks So that's great Now let's talk about state and the problem of state now Application code has been in source control for a long long time We've all we have won the battle against getting people to use source control. Maybe it's still waging on in parts of academia But source control is the standard and everyone's very comfortable there everyone knows how to use it be it, you know gets a version mercurial whatever your flavor and That's helped by the fact that application code itself is or Let's call it application codes such as Java or or Python or any application language So to speak is stateless in the sense that data just transiently passes through it. It doesn't reside in it and That means that you can kind of arbitrarily go backwards and forwards in time traversing your tree of History to kind of get back to a known state which you it might not necessarily be working But at least you know what it is Coming to and there's also just amazing tooling and support for that type of development environment You come to databases. I think in my view, it's still an unsolved problem How do we bring source control or the concepts of source control into the world of databases? because the head of bases hold the data as well hold the code as well as the data and The data is a resident in that database and points in time are actually tied to the particular state of the data At any one point so trying to go backwards and forwards you need to do it in a way that is Infected by the data or once you pass a certain point you can no longer go backwards in time And that's a bit of a problem. There are some good tools for managing the structure of the data So the metadata that describes the structure of the database But I think we're still trying to find our way into hanging out how to do that for the data inside it So where is that? Leave us. Is it just best to assume? Nothing of a database and just treat it as a dump container the hitter in the hitter out Well, it depends and actually that's been the approach for a long time and it's served us very well That's what we've been doing SQL and applications today. We make use of a lot of Objur Correlational Happes of those are tools if anyone's used Django, you'll heard of her Django RM But the other ones as well like Like secret alchemy is probably the most well-known and the basic structure is you have your models and you talk to RM Back end it takes care of going to her database driver to do your Communication for you and that's actually quite a good pattern for this whole wave of web-based technologies Because you normally want to model something happening, you know a user doing a process putting something in a checkout basket Can they do that can they not you do really you really are dealing in objects and relations and you are genuinely making a mapping for them But SQL and applications today. Can we do more? You know these are very high performance machines Especially now you know now they kind of go over a hump of kind of being in her shadows for a bit You know, these are normally written in low-level programming languages like see us keep us plus If the shame you know not to not to make use of what they can do So we can work with Jason types we can work with XML You can also do things like if you can do more stuff in the database when you can do your computation where the data actually resides That's great. But also it also saves you network traffic in the days of so in you know in these days Where we're just hoarding and using so much data in our modern web technology stacks or even any technology stack That number does add up You can I think the killer feature of all relation databases that you can guarantee and enforce the constraints of Her business logic in that database and it can never go wrong and it saved me a number of times From some things that I did that went that best advised So how can we use all of this in a simple and easy way from our Python code? How do we saw and how do we solve the source control problem at the same time? So? Let's take a look. Let's quickly taken a side. This is a a snippet of a Jason from a Pokemon API. It's quite fun if you want to use it actually so this is an entry for Pikachu and With something like Postgres and most of the other databases do also in do also implement these Obvious these access methods as well, but they're not quite the same syntax So I can directly go into her Jason which is stored in a row in a column And I can get the attributes and I can unpack them and then I can extract a particular sub object So that's the type of stuff that we're able to do so no longer bringing something in as a dictionary and then doing Dot keys dot items and then Traversing stuff hundreds of thousands of times So this point I want to introduce you to sequel pie. It's an open source library Which I've been working on for maybe a year now you write sequel next to your application code and it does kind of break the paradigm of Having something that isn't Python next to your Python, but it's separate It isn't mixed in with your Python and that's basically the idea behind it we had to something had to give somewhere and what it does for you it prepares a functal a functools partial for you and So that you don't have to have your query strings randomly to heart it around all of your code At least it can be in one place. It's very lightweight. It doesn't make any It doesn't ask any questions of you It just tries to get out your way and let you logically organize your query strings rather than try and Tell you how they should be executed or that kind of thing and the idea is not new The kind of concept came from a library called yes equal which is written enclosure Also by a fellow Londoner open source enthusiast So do go check that out as well. I'm there are lots of ports lots of ports for other libraries as well and It's in source control your queries can be in source control now And Why do we need this because that awkward moment when you need SQL? So this is a I have picked the example to be to look quite hard But this is a sub query using something like SQL alchemy and what it is you need to define two objects one that defines their inner Object and we have the query and then you need to join them together And this actually is like pyphonic is quite nice to read. It's fine But the semantics are kind of leaking a bit between each other So at the bottom of the sub query The order by and the group by now if any of you know sequel those are optional words But the fact that the thoughts of the order buys after the group by does that mean that you can't use Order by unless there's a group by that's not really so it is leaking slightly, but it's not the worst, you know Accusation of it. So but that's just where we're starting to Hit these barriers of when maybe we need to turn to using SQL strings ourselves Directly, how do you do the same thing in SQL? It's that top one there. Just a few lines But also this one's quite funny because it is actually quite easy to do it just in Python Just with is it tools? So this is a case where these complex data access methods Are hard of things like object relational methods, but are easy in either sequel or in just plain Python So there's a choice either way If you still like to tie things to models then you can basically freely make your own models So you can use a serializer like marshmallow and basically just if any of you have worked with Java or C++ The data access object pattern in which you define your object And then you basically encapsulate all of the things that go and get things in another database in that object Enterprise being as well, you know, I don't want to give too many people bad bad memories But that's kind of a that that that kind of stuff is totally in your hands as well SQL pie composition, this is basically the thing which I only need I only realized I wanted about six months after making the first version So the first version of sequel pie you can only just do a query string as it was you couldn't make an assemble Different versions of the query depending on what you are doing it and quite rightly So it is quite a hard thing to do and it's quite a dangerous thing to do You don't want to be passing strings into your into your db2 API without, you know, checking that everything is fine so the solution to it was something I called Built SQL in which the clauses and are added depending on the data which you passed to it So if there are certain keywords present it knows how to construct it so that your ands all in place So for example select star from where and and and and it will only make those ands When it has the right data, but importantly still passes through your particular Library of choice to make sure it's all safe and and at the end of day So how's that look this is an example table So on the top is the query that you'd see in your queries file That's kind of in your in your host code You can recognize the placeholders there in keyword argument form then you would just pass in args as a Dictionary and it would know what to assemble at the end of the day Beyond the web apps so python is taking over so There's a multitude of tools that have made python interface language of choice for so many things But the glue to hold things together has been bash scripts, which I'm learning more and more about every day I mean it's quite amazing tool or Piper scripts But why not use Python all the way through our systems? Since you know where we're all here and we all enjoy you and we all enjoy making use of Python Can we make use of it? More thoroughly in our applications and in our in in our systems So show of hands who here uses Python in a data science context And that number of quite a lot of people so even if you haven't been convinced by what I'm saying I still like my files are still like my ORM. It doesn't matter because you don't have a choice Let me explain why Moving data from A to B is kind of a very frequent task and they hate the cleaning is and Transformation is something that we probably spend too much of a time doing that we love to hate and object ratio mappers are sometimes not very Useful in that sense because you're not really doing entities We're kind of on the fly taking things out in the V attributes We don't want to have to go make a new CSV file every time and stick it in is to get it's kind of a whole Palabra so something like sequel pi can help In that because you can put it in your Python code where you're actually doing your experiments And you can run it from my Python you can do whatever you like a bit Because all it really is is just assembling your queries and at least you know where they are that they're all in one place So why did I say you you have no choice? Because all of these modern new kind of cloud native data stores have all decided that sequel or sequel like variants are going to be their language of choice for allowing you to describe the interface as to how to get to as to how to get to Access parts of that so just the they're not implementing the full sequel standard But they're implementing quite a lot of this so you select severe your your your forms and your ways are all there And just to feed examples Kafka has case equal spark has spark sequel big query has what a query and Even a divest has something for using SQL to search what essentially unstuck file objects So that's pretty cool So my last slide I'm just going to see how we can use sequel pie to also play with these So since sequel pie has no opinion as to the back end or the store which you are using It just expects something which kind of has the methods defined in the dv2 API on her cursor So stuff like execute fetch one fetch all So let's take a look if this will work and let me transfer It's after do it from the screen. So what I have here Let me just zoom in so you can see it. How's that let it Cool, so let's take a look at so what I've done. I've done a very quick Basically, I followed the tutorial from Google cloud on on pay query And this is they have an open data set of stuggle of low questions And basically this you can see what I was getting out by having SQL strings inside your code So this is piping code with quite a significant proportion of it taken up by things that are strings and It's fine to read but imagine if that file suddenly became really really long. So all I'm doing I'm just importing the big query Library, I'm just making my query the string. I'm initiating the clients Well, I'm right. I'm running it and I'm printing it. Let's go over to overtime to see what it looks like So I need to run And No sequel pie So all that's gonna do it's just gonna print out the top questions that had the tag Python in them So that's what that looks like. So if I switch back quickly to the other tab And show you what is in The other file which is run sequel pie Much neater and all I've done at the top is I've just kind of just defined a mock Object this idea only came to me the other day So I'm kind of it's there So it all I have to do is just define something to somewhere to put the results and execute method Which is basically taking what the What they need from Google cloud and a federal method and all the main function is just been reduced down to these 5 or 6 lines will have to print. So finally as my last show Let's run that Run sequel pie and hopefully should be exactly the same There we go so Equal pies out. I was trying to get versions is here point free in time for today It should I just need to update the doc so wait till tomorrow. I hope you have done it by tonight And thank you very much Thank you very much us We've got a couple of minutes for questions and as typically happens is always on the other side Um, can you have multiple queries per file because I just yeah, yes, you can yeah, I didn't see that the API could Basically, you can either pass a file with lots of queries in it Just separated by new lines and it can figure out. It's all explained in the docs They're very comprehensive or you can pass an array of different files survey if you want to keep them separate So if you're using blueprints, it's nice to keep one query per blueprint And then it and then it gloves them together and then you have your object So I'm still working on that because kind of you have the whole query object for every blueprints Maybe we should spit it so contribution is welcome. This is my first open source thing I've been trying to get something open source for a while trying to get it back in some way and this is my first attempt Thanks for the talk Could this work with something like a sync IO because the databases performance is always key We do use it in a sync IO application, but we it wasn't like oh, let's Specifically going we just happened. I didn't even think about it. I just yeah, we have a sync IO Process that runs and puts them in database. So yeah, I guess it doesn't work Because the all of the connect all of the connections kind of is done by it's handled by the eb2 API so all of the constraints about file IO are a matter of with the with the Psycho Pg or C or MS SQL there's all those different libraries Any more questions Hey, well, thank you very much. Oh, well one more. So can you give us the slide stack Yes, so if you At the bottom so hustek is where is what I'm on github or Twitter and stuff. So You'll find it there Awesome. Thank you very much husts fantastic