 Good morning, please welcome our next speaker, Honza Kral. We'll be talking about designing a Python interface Thank you Hello, and good morning As I've been said my my name is Honza Kral and I would like to talk to you about designing a Python interface and The better title probably for for my talk, which only occurred to me later is An illustrated guide to this Because you all know what happens in Python if you if you type import this you get sort of a mission statement instead of guiding principles of Python itself and for Python code in general of How it should look and how it should behave and that is what really this talk is about how to take those principles and Apply them to designing an interface to a to a foreign system So first a little little motivation what what brought me here and that's that's elastic the company that I work for Where I actually create a Python client for elastic search It's a search and analytics engine and it's not at all important for this talk But what is important is we'll be using it for some for some examples So don't be I don't be alarmed if you see some Jason somewhere in there. So let's get to it So import this the Zen of Python as I said a guiding set of principles and I always like them, you know, they sound cool and they they really Make sense to me, but I always struggle a little bit. How do you actually apply them? How do you apply them to code? Does that actually? Translate does it make any sense? In this talk, I would like to share how I actually discovered Or maybe I just rationalized it away. It doesn't really quite matter probably How I discovered how I use these principles when I design a new API and how I actually apply them in a real life and Because I actually said I a lot in just in the past sentence a little bit of disclaimer is in place This is obviously my personal opinion. This is what I find that works for me and Also, some of you may have seen my code and documentation or lack thereof so please do as I say and not as I do because I certainly am not I'm not perfect and my code Definitely reflects that so Any good talk begins with the definition, right? So API, what do we mean by API? What do we mean by by an interface? So for for this talk in particular, we mean a Python API an interface to a foreign system so something that will allow you to talk to a third-party system in our case elastic search and This is something that typically you don't need in Python I mean Python is touring complete, right? You can write absolutely anything in Python So you don't need these interfaces you can communicate directly especially in case Like like elastic search, which actually just speaks HTTP in JSON You can just use requests or any other favorite HTTP library and talk to it directly But that's not really what you want to do You don't want to keep rediscovering the wheel and that's why we have API's the some somebody Created an API to make things simpler to to hide away the complexity So you might talk about an API in this case that it is a service for code For the real code for the code that actually does the work for the application code and there is a huge difference between the the code of the API and the code of the application that the real code because the API is really doesn't Really apply for a specific use case Because the API doesn't know anything about the real code so it can be used by many different people in many different organization and Hopefully it will be like that's sort of what you hope for when you create an API, right? That people will use it that it will spread so The application on the other hand is always written to solve a specific purpose So the API fulfills a contract for for that code and The contract can be either explicit or implicit It can either be explicit via documentation. You can document the contract You can say these are the methods these are the responses and that's probably something that you should have either way But you can you can also have sort of an implicit contract a cultural one if you please which sort of Makes the API behave as you would expect it. It should be it should be natural and you should be able to rationalize and and to think about the API and Avoid any any surprises. That's sort of what you want for the users of your API To be able to use it as if it were part of their system to not having to think about it too much But first of all we need to we need to address some of the issues with the API the first one is that the API is vaguer than Then application code and yes, I actually had to look this up. It is an actual word I wasn't sure for a while. It is actually it is actually accepted by the dictionary Okay, the the the British protest again What a surprise So it is a lot vaguer. It doesn't really know what it's going to be used for an application code is specific it solves and it solves a problem you create an application to scratch an edge to to Deliver a solution to your customer or to power your product But your API can be used in so many different ways that you never you never know and Making any sort of assumptions on how your API is going to be used is Can be very can be very difficult So ultimately the API is just a tool and you always have to keep that in mind That it's a tool that should be a that anyone should be able to wheel to create And it's a tool to simplify access To simplify access the crucial part there is to simplify and We have our first line from the Zen of Python simple is better than complex Complex is better than complicated And that is the purpose of the API to actually facilitate this this line to take something that would be complicated opening a socket Creating an HTTP header sending it over creating adjacent bodies sending it over receiving some responses Determining if everything went okay, et cetera, et cetera. That's definitely Unnecessary complex and that applies to to anything if you want to work with HTTP You also probably wouldn't start with raw socket, but you would use something like requests Well-designed library that just Gives you exactly the functionality that you need in a way that you can that you can use so That's that's what a simplifying access is about it's about hiding complexity So this is a query for to elastic search Don't worry if you if you don't understand it Essentially what this is is I'm looking for something that that matches Python in the title It must not match beta in the description. So I'm looking for release. I'm filtering only those packages in Category search and I want to do some aggregations. I want to see the distribution of tags and the maximum lines So just some just some query Assuming assuming a data set and that query is not important What is important that there is a lot more things on the screen than what I just read and I read all that this does in In the end I it just prints it out with the relevancy score and the and the title of there is of the document So there isn't really a lot a lot of things actually going on But there is a lot of things that are being typed there. So how do we how do we simplify that? How do we hide the complexity? well this is what I came up with and I'll spend the rest of my talk explaining to you why and how I came up with that I already see some people cringing because they've used this code before That's never a good sign Okay, so in this case I tried to extract only the things that are relevant only And the stuff that I actually read only the action items not really Not really all the gravy all the boilerplate code and to me. That's what an API should do It should hide away all the boilerplate While leaving all and that's that's the crucial part while leaving all of the important parts Not just some but all of the important parts And that's sort of the embodiment of another line of the Zen of Python be explicit Explicit is better than implicit So while I I hid away a lot of the complexity I didn't hide away the crucial parts and That's sort of the Very important decision what to hide and what not to hide Because you can always go to the next step. You can always imagine how you could make this even simpler For example, just creating some sort of queer language where you would just say the three words and say this should be there This shouldn't and this must be there as a filter Etc. But at that point it's it's getting hard to read. It's getting hard to get into it's getting harder to reason about so be explicit and In in the word of code what that means is when you're hiding Complexity do hide the mechanics do not hide the meaning so If you're doing an HTTP request with Requests you still have to know what is the difference between get and post and put That is the meaning that is not the mechanic what you don't really want to know about is about sockets and and Parsing of HTTP headers. You just want a convenient access to it So that's mechanics. That's something that you should hide away. That's something that's not specific To the problem that's specific to the implementation So this is sort of how how you how you draw the line what to hide and what not to hide So if we look into Into the the original the original code So you get you see I've highlighted the parts that are actually just the mechanics Sorry about this Where you see that I have a I have a bull query Which which has three branches must must not and filter and those can be those can be very confusing and Also, they don't matter They don't come they don't carry any information. They're just a Way that the query DSL the language that elastic search uses based on Jason how it expresses How to combine other queries It is it is the how it is not the what in this case This is not what you want to do This is how you want to do it and this is something that I I don't want my My users to have to know in order to use my library However, I am fully comfortable in forcing them to know all the rest So I that's all that I hid I never I never hid the match Or or the term query So This is the meaning you still have to know what is the difference between a term query a match query And how to do how to do a negation in this case the negation is in Python. So we should all be familiar with it but That's the meaning that's the same as requests still asking you to know the difference between HTTP get and HTTP post If I if I hit this away It would make a lot of a lot of people's lives easier But then you would have a very narrow ceiling after which There is nothing there is nothing you can do and Also, this means that I don't have to Teach my users everything they can just use the skills that they already have by understanding the query DSL of elastic search itself on The other side with the requests example people understanding HTTP and HTTP methods. I don't have to Reinvent that it's already there people already know it. So all I have to do is give them access to it and A similar thing goes for goes for the results Here I have some some crazy dictionaries with with underscore source and underscore score That can be a little difficult and Again, it just bears no meaning 90% of the time people just want access to the field or Alternatively the meta fields. So again, just abstract it away Hide it and also provide a more convenient way to access it So instead of square brackets and quotes and underscore source in every single line just use the title So this is something that just simplifies the mechanics while not actually taking anything away And also It's good to fully admit to the leakiness So in this case still showing all the different all the different query types All the different aggregations you see that I still force The users to name their aggregations just as they would in the in the query DSL So they can then get them back in the results. It's a very thin abstract layer Exactly because I want people who use elastic search in some other context Maybe from another language maybe directly through the browser or the command line. I want them to be more effective I don't want them to have to learn yet another tool to do what they already can do And also once they learn this tool I don't want that to go to waste if they have to change to something else or if they have to ask for advice online If I if I created my own complete queer language, and then I had to ask someone On the internet. So this is what I'm doing with AUSIC search. Can you help me? Well, the answer will be no because nobody else will understand that where so the standardization the the leaky abstraction here is very very important and very deliberate and It is also because well, I'm lazy and I don't want to rewrite the entire documentation of elastic search. What query does what? Here's that too So that's sort of Another guiding guiding principles be familiar present to the user something that they Did they know from somewhere else? whether it's a whether it's a universal concept like different types of HTTP requests or different types of queries in elastic search or Something that's already been used before So don't be afraid to just Copy shamelessly from stuff that you've seen around So essentially you could sum up the library that I created the elastic search DSL as a combination of these two things The second one we've already seen that's just using the raw elastic search API and the first one Well, that's Django query sets So that I borrowed some You know some patterns from Django some patterns from elastic search and combine them together so for example the the chaining the the fact that every Additional filter whenever you add a filter you will get return a copy of the of the query that comes from that comes from Django That's something that people expect or at least are familiar with and Then you have all the different all the different query types that come from from elastic search so Python people who want to use elastic search should be familiar with both of these concepts and the API should feel Homie to them should feel familiar They shouldn't be surprised and they should be they should be effective So once I did that I Turn into to another rule be consistent once you figure out your approach stick to it So special cases aren't special enough to break the rules so every single method that you have on this on the search Object in our six or DSL will return you a copy the the chaining works just as expected and The other important part that people sometimes forget about when talking about consistency is the naming name things consistently name things consistently with other systems, but more Exactly within your system always call it the same and that is both in your code and also in your in your documentation and And of course only do this if this makes sense Because practicality always beats purity in in Python like we are we are practical people. We were not really interested in the in the pure Purity just for just for its sake so Don't be afraid to make an exception Like trying not to make one, but if you have to that's okay So in our case We have we have the queries Of the top that follow the pattern every every call to a method will create a clone of the query object mutated and return it However, when I try to do this for for aggregations This just didn't work because aggregations can be nested and the first way how people Me including try to represent nested aggregations would be just to nest nest the chaining calls And at that point it all broke down. I couldn't no longer be creating a copy after every after every call So I had to I had to break the pattern. So when you access as the eggs, it's actually done in place So don't be afraid to break the rules try not to But also keep in mind that it might happen that you just will have to and that's okay the zenef python says so and smarter people than me wrote that so i'm okay with it Another very important rule when designing an api is be friendly Be friendly to your users Uh on both sides be friendly to the users of your api But also be be friendly to the system that you're trying to simplify access to And to me The only not obvious part Well partly obvious because it's still pretty obvious Is to realize that python is interactive A lot of people use python from stuff like ipython Or they use fancy ides to explore the the apis and you should you should be able to support this You should be able to help them with that by providing them all the tools that they could that they could ever need In the case of python those those are the three main ones their Representation and and doc strings which if if implemented properly will greatly help with With actually Allowing the users to explore the the api and start using it both the beginner users who just Came to your code for the first time and are just exploring around And also the advanced users where this can greatly speed up speed up the process For example the the representation String is Often underestimated and it can be super useful One of the most common questions that I get with elastic search dsl is I have this crazy query in in json that somebody wrote or some other tool generated. How do I express this? in elastic search dsl Well, I say that's easy. You just create a query out of it by wrapping it in the in the cube object And just ask for the representation And what you get back is exactly the code that will that it would take To reconstruct that just using just using The dsl library So that's what really the representation string should be A representation of the object that you can Essentially retype into python and get the same thing In some cases, it's not practical if you have large objects, obviously or you have something that can only exist Exist ones. It's not that useful But in this case, it is definitely definitely very useful And it saves me a lot of time Because this is something that I myself use Quite often that I have this I have this crazy dictionary containing a crazy query with 10 different sub queries and 50 aggregations And I want to manipulate it And manipulating the dictionary itself is is quite difficult. That's why I created this library in the first place so I just wrap it in the wrap it in the cube just get the Query object that I can work with And Then when I want to Put it in my code I can use the representation And and put it in there And of course, of course dog strings Be be nice to your users Provide even some examples in your dog strings. Those are the most useful if you if you're reading if you're reading A header of a method like this it's pretty Pretty evident what it does, but if it were anything more More involved with more parameters, it's always nice to include an example right there Both for when somebody is reading the code and also when somebody is just looking in their ipython Of on what's what's actually what's actually going on So that's one part of python being interactive That's the more obvious part The second part is enable iterative build because again the zen of python teaches us that flat is better than nested And sparse is better than dense So what does that mean if you if you build something up if you if you want to build a Sophisticated query into elastic search You keep adding clauses first filter on this then on that add this aggregation And if the user requested this filtering add this kind of filter So there is a complicated state that you need to remember what the query currently looks like And it's a sign of a polite API as I would call it that it doesn't force you to remember the state But it can store the state internally So you can use the api from the get go and start building your query in this case or your request in other cases And uh You don't have to store everything yourself So in our case it looks like this You can also see that this Enables for nice practices like commenting the code and actually explaining to To the users what they're doing And I can go line by line and very easily very easily deconstructed even if you're new to New to elastic search you will probably be more or less able to tell what this does Especially with the comments Sure, there's still some there's still some magic. There's still something that's specific to elastic search We've talked about the term and match and all the different Weirdness of the syntax, but overall this should not be surprising I I want to only filter category search Then I want to match the title to python Etc etc But the most important part is I don't have to first build up some weird dictionary containing all the keyword arguments Or represent the state in any way I can just keep creating and keep adding to the search object and Be be happy with that So iterative build it's often something that's that's Uh underappreciated because it allows you to Not care about this about the state which can be which can be very hard and finally When we're talking about being friendly safety is also very friendly You should always fail explicitly Unless unless it's explicitly silenced. So all your defaults Should should actually be the safest possible So if you have if you have any option to fail for example Elastic search will always give you a response even if only 20% of your data is available It will give you a response and say hey, I only see 20% of your data But here have some have some results anyway And then it's up to the user to decide whether that's good for them or bad Uh When faced with the decision like this like do I do I fail in this case or is it okay and do I leave it to the user I say always fail But allow the user to override it Allow the user to say I I'm aware that this is a situation that might happen And I don't care But at that point you force the user to explicitly own the responsibility To maybe even do some do some research You've noticed that that's something that uh that I've repeated Quite often during this talk To not be afraid to force the user to actually learn something about what they're doing And this one is particularly important because Hey, it deals with it deals with safety and it helps prevents nasty surprises Once the user moves to production also Think about how you test your code And how other people test their code Provide some sort of of uh dummy interface or maybe just a set of test uh test case classes uh for For the uh your favorite testing libraries out there So that uh people Have lower barrier to entry to do some testing Essentially obey the testing code make the testing as simple as possible Because that's what you want to do Ideally you already have some code like that somewhere somewhere in there already because you should be testing the api So it's only a matter of exposing exposing that part Also to to your users Sort of a test helpers or something like that And the final chapter is about api still being code it's different from an application code and We've we've highlighted the reasons how But it's also similar in in a lot of cases It needs to be tested Things can change and you might need to you might need to adapt also There is a there is a lot of a lot of decisions going into How do you decide what goes in? And what doesn't what are what are the features that you want as part of your library as part of your api? And what do you leave sort of as an exercise to the user? Do I provide this set of helpers do I? Do I expose this functionality this this parameter or is it only used by by a very few people? And there are there are several several ways how to how to do that First of all is the actual decision but second is that's my favorite to avoid that decision and allow the more advanced users to sort of to always step away a little bit and Provide them access to the lower lower layer the more low-level api In case of elastic search I can always go one step back and just send in a raw dictionary if I don't want to create The query using the iterative syntax that I just showed you I can always just create it myself And at that point I don't have to care What options are supported by the dsl and which aren't I can just do everything manually and send it back The same again. It's not a novel idea It's the same with the dot-raw method on Django query sets where you can just send in a SQL query When you don't want to rely on the orm to generate one for you So always allow access to the lower level if at all possible And also admit That no code is perfect And that goes especially for especially for apis because there are bigger Than than the actual application code And when deciding what to include and whatnot keep in mind the Last line of zeno python that we'll deal with today is now is better than never although never Is often better than right now And that's the that's an important that's an important thing When you don't know whether to include something or not it's perfectly fine to say no especially if you if you have a way for the user to to move around you And so I always prefer to give them a way around my code Then to try and support all the different possible avenues through my code because that can lead to a nightmare They can lead to a nightmare for my users with an overwhelming amount of options And definitely for in a nightmare for me Supporting all the different all the different combinations So so think about what makes sense and how hard it is to do it without direct support in your code if it means that When I'm not supporting this option a user will have to Create a dictionary and send it in manually I'm okay with that If it means that they will have to instantiate a new connection and talk to a socket directly and and do some other complicated stuff Maybe not that much and I want to I want to provide them that functionality directly so that's sort of The last part of how you decide what to support and what not to support And now I believe we have 10 minutes for questions Uh, if you don't get your questions answered or you just want to yell at me there is my there is my twitter And anybody has any questions? Thank you for your talk and uh, I had a question. So, um, if we create a great api with expressive api with expressive syntax Um, don't you think that we're making the internals of the of the api more complex and really hard to maintain? Can we avoid that? Yes, uh, there there is not necessarily a correlation between the complex And the and the ease of use Yes, sometimes, uh, you might need to resolve two More complex things like using metaclasses and descriptors if you want to make things nice Uh, but it's not it's not necessarily true So it might happen, but it but it doesn't have to there is there is no direct correlation there Thanks Anyone else Hi, um, what role do you think emulating built-in objects or using like python types? Like the built-in types comes into, uh, writing a python API For example, when like When should you do you think? You should use dictionaries Uh, lots or should you instead use like make a little dsl for your particular Thing or I see uh, so I I prefer to uh Again, uh Decide what can you take away and then compare the results? So if if I use raw dictionaries and if I if I create my own thing What part of the pain goes away? Is it enough to justify? The the the extra dependency the work that will go into the api, etc Just sort of do this exercise in your mind like Okay, these things I can I can abstract away these things I can generate automatically So the user will only have to put in these these four things And then compare the results it might be that the differences Wouldn't be big enough to justify creating a dsl creating a library from both the point of Putting the work in and also forcing the people to to learn that So that's that's what I do like every time I Sort of walk up how the api would look like and then I compare is it worth it or is it not? Does it make sense? Okay, I think we have time for one more question Okay, if you have any more questions, you can just grab them at coffee or during lunch and give them a big hand of applause Thank you