 All right, I think we're ready to go. There we go. So we are live welcome back I hope everyone's got a fresh cup of coffee or whatever your beverage of choice is welcome to the Afternoon of the second day of Euro Python 2020 and Good morning to everyone in the Americas myself included where it is currently 6 a.m. on the West Coast And of course good evening to everyone joining us from Asia So we've got some awesome talks lined up for the next few hours and one of these is Reuben Lerner who teaches Python data science and get to companies around the world Offers many online courses including weekly Python exercise community-based way to improve your Python fluency over time He writes two free weekly newsletters He's the author of Python workout co-host of the new business is freelancing podcast and he's going to be talking to us about Something that I rather enjoy sorting especially sorting in Python So this is going to be fun. Thank you, Reuben Welcome Okay All right. Well, hi everyone all over the world online welcome Sure, we can't meet in person this year for Euro Python, but this is I must say kudos to all the organizers I've been supremely supremely impressed by the conference said from the the back end of it from the speaker's perspective really impressively organized As I said, my name is Reuben Lerner I talk is how to sort anything a few words does it also said so I do corporate training I have all sorts of online courses. I just published my first book with Manning Python workout I actually have the hardcover remember hardcover folks. That's right. I have my weekly Better developers newsletter as well, which I think currently has about 18,000 subscribers with a new article about Python each week so sorting Sorting is important. You might say it's sort of important Okay, get used to my humor folks the next half hour at least Sorting is important sorting is something that we do all the time for a whole lot of different reasons and and for a Whole lot of different ways. So why do we want to sort and mixed in together here? I have some more abstract ideas and some more concrete ones So we might want to display our data nicely Let's see all of our data in calendar order from January through December or you know order of years from earliest to latest Maybe I want to make my messy data slightly less messy. I want to sort of be able to look at and understand What's going on? I also want to find the largest or the smallest value in a collection, right? What was the smallest amount I made over the smallest amount that I made in the last year or the largest amount I made from purchase in the last year we see which products sold best or worst which suppliers proposal assume your company You've got a bunch of potential suppliers which proposal will cost you the most and then if you're you know using ways or another Sort of GPS system you want to find the closest gas station and do you want to find the closest one? Do you want to find the cheapest one? Do you want to find the one that will take you the least amount of time to get there? These are all things we do with sorting Maybe you're watching Netflix because you haven't left the house in six months and you want to find a film similar to the one You've just watched well there you go We can use sorting for that Maybe you're on Amazon and you want to find or they want you to find products similar to the one you're looking at All of these are involved sorting and Python actually makes sorting really really easy If you have a list you can use the sort method So basically I can define my list here my list is ten five minus three seven minus two four And then I'm going to print out I'm using the cool Python 3.8 f-string syntax where I can have Inside the curly braces variable name equal sign and this will print out as variable name equals and then its value So I'm going to print out the value before sorting I'm going to run my list out sort and then I'm going to print out the value after sorting and sure enough I get what do you know before I sort it the list is exactly as I defined it and after I sorted It's a numerical increasing order Okay, so far so good. Well this there are a few things to realize about list on sort It's a list method So it's only gonna work on lists if you want to sort something else tough luck You got to turn it into a list before you can actually run sorting It's sourced from smallest to largest that's by default and we'll see how we can change that a little bit And here is one of the most important things it changes the list object itself So what we see here is my list is again same exact values get used them We're gonna see them quite a bit and then I assign a second variable here also my list to be equal to my list What this means is we have two variables referring to exactly the same list. That's fine That's not a problem in Python. We can do that and I'm gonna print out What is the value of also my list meaning the second variable, but I'm gonna sort my list So I'm gonna sort via my list What's gonna happen to also my list and the answer is before also my list is they look exactly the same and After they also look exactly the same that is today There's no difference between sorting my list and sorting also my list because you're not really sorting the variable You're sorting the list that the variables refer to and so this is one of the problems with using the sort method on lists that Yes, it modifies the list But yes, it modifies the list which means all the references to it will see that Another problem with list.sort is that it returns none This is a common convention in Python, especially the standard library that if you have a mutable data structure And you run a method on it that modifies that mutable data structure then that method will actually return none It won't return the data structure itself So I see this all the time all the time my courses people do this because it feels sort of natural So we define my list We're gonna print what it is before and then what am I gonna do here? So remember that in assignment. It's always right side before left side So on the right side, I'm gonna say my list.sort that will actually sort the list It will be sorted but the return value for the method is none and thus we have Assigned none to my list and so the output for this is gonna be before it's our list and after it is none So if you're short on memory, this might be a great way to like solve those problems Otherwise though not a good thing So what can we do to avoid this? Well, we can avoid it all together We can ignore it all together. We can use the sorted method I'm sorry sorted function and Sorted is a function that comes it's built in so it's in the state of libraries bill You don't need to import anything to work with it. It works with all iterables not just with lists So you want to sort of tuple you want sort of dictionary you want to sort of file You can use all those with sorted now sorted will always return a list and it'll always return it by default Sorted lowest to highest and here's the greatest thing. It does not modify the source data at all So if I want to use it, what can I do? Well, let's start off with my list again, and I could just I don't need to do it before enough I could just I print sort of my list and Then I'm gonna print my list afterwards just to compare and sure enough the sorted of my list Shows me these sorted values there, but afterwards my list is still the original values meaning Sorted will return a new list and I haven't touched. I've been changed my original list at all This means that I can sort tuples. How can I sort tuples? You can't change tuples, right? Because we're not changing anything here We're just gonna get a new list back based on the tuple if I sort a dictionary I'm gonna get the keys back in sorted order, but I'm not touching the dictionary and so on and so forth Now how is all this being sorted? well Because many people ask me like is he's using quick sort is using merge sort Well, give you a hand the sort algorithm was actually created by Tim Peters and that's why it's called Tim sort And in case you think that Tim sort is this weird esoteric Python thing It's not anymore Tim sort is now apparently the default sort method used in Java And I know Java is like not going anywhere, but but it does increase the fact that we've got a language here That's actually working. Yo sort that's working very well And Tim sort is this interesting combination of merge sort and insertion sort We're basically what's gonna do is it's gonna assume well It assumes in general that your data is not completely 100% random that if it's coming from the real world It's gonna have some natural runs. It's gonna have some Sequences that are already sort inside of it. So it's gonna take advantage of that. It's gonna create a few runs Hopefully some of them being natural runs. They'll merge them together. What happens if there are no runs? Well, that's gonna create a run. How's it gonna do that insertion sort insert insert insert merge insert insert insert merge And that's how Tim sort works and it works pretty well Well in order to do both the insertion and the merging it's gonna need to do some comparisons And so Tim sort needs to know is a given two items in our list in our tuple or whatever Given these two items is a less than b is a greater than b or is a the same as b? And so tips is gonna rely on this now before we were sorting a list of numbers Well list of numbers is gonna work fine because numbers no less than numbers no greater than numbers no equal equal Trivial that's great, but what if we want to get a little more interesting? What if we want to sort a list of strings? So here I'm creating a list of strings. I had my string I run dot split on it and split of course returns a list of strings Basically, it breaks up our string along white space into a list So words here is a list list of strings and I say print sorted words I'm gonna get out the words sorted from lowest to highest Well lowest to highs was that mean well we as humans would think of it as in alphabetical order from the earliest in the dictionary to the last Dictionary, so how is Python doing this? Well, it says well, we can we can compare one character string, right? Remember, there are no really characters in Python. It's all strings So one character string can be compared with less than or greater than or equal equal And the comparison is done based on the unicode code point for that one character string now in case that sounds super Like fancy and difficult to follow you can think of it if you're a dinosaur like me as just ASCII values That basically every character is a numerical value value So little a comes before a little j comes before a little z so it will sort those Alphabetically aha, but remember that in unicode and in ASCII capital letters come before lowercase letters And we'll talk about this a little bit later In that so things that begin with a capital letter will come at the come before things that begin with a lowercase letter In any event so Python first checks given this by the first checks the first character to two strings We want to see I have two strings word one and word two Which one comes first? So we're going to check the first character and we're going to apply this rule with a less than rule or the greater than rule Right, let's compare their code points Let's compare their ASCII values and we'll know then which one comes first That's fine if we have a one character string which compare index zero what happens if that's not obvious What happens if they are the same then we go to index one what if those are the same we do the index two What is the index three until we find a difference between them? This is how we know then if two strings are equal if they're equal then all the characters are the same man Well, what do you know neither comes first? This also means though If one is a substring of the other we're going to return the shorter string first So if you have find and finding so find is gonna come first because it's shorter And if this sounds familiar to you, this is how we look things up in a dictionary as I mentioned before We first look at the first letter second letter third letter We can alphabetize things in this way the cool thing is that Python does this not just for words Not just for strings, but does this for all? Sequences and so when I want to sort a list of strings It's going to use this dictionary starting to zero then index one then index two formula What about list of lists? It'll also do that index zero index one index two and tuples the same way and Listen tuples implement less than greater than equal equal in exactly the same way as well Where they're looking first index zero then index one index two and so forth. Let's just check this So if I have two lists list one and list two I have ten twenty thirty and then ten twenty fifteen So Python is gonna say if we ask Python is this one less than this to it'll say no It's not is this one greater than this to the answer is true. Why well check index zero. They're the same So let's check index one. They're the same check index to up this two comes before this one Thus this two always comes before this one this one comes after this two What if I want to sort a list containing different types? So here I have my list is twenty B a ten thirty and let's print sort it on my list fantastic Oh, no, it's not We get an error and the error we get is type error less than is not supported between instances of stern int Meaning Python is saying to you. Hey dummy. I don't know if you give me a string You give me an int which is supposed to come first I have no way of understanding. They're knowing that now those of you who have used Python 2 might know that this actually did work in Python 2 you were able to use less than greater than an equal equal In Python 2 between strings and it's this was changed in Python 3 because it was such a Hard thing for people to understand because it wasn't comparing them in a numeric way It was saying all strings come after all integers doesn't matter what their values are so if you knew the Python internal Sort order for different types. It was fine But clearly no one wants to keep that in their heads and so Python 3 they said if the types Don't know how to compare themselves to each other. That's it. We're gonna give you a type error So don't try to do this or try to do it if you know that it will not work Okay, what if I want to reverse the direction? Well, if I don't want from lowest to highest I want from highest to lowest So I'm gonna say my list equals 20 30 10 and now I'm gonna pass a new parameter I'm gonna say reverse equals true and reverse equals true basically just swaps the valve and swaps the order So basically instead of being lowest to highest will be highest to lowest and indeed we see here that we get 30 20 10 So far so good But what if I want to sort a list of words not the words themselves not alphabetically, but by their length This is where they have things get really interesting and really cool We no longer want Tim sort to be saying is word one less than word two Rather we want to be saying is the length of word one less than the length of word two Right now We don't want to sort the length here sometimes when people see this or probably say oh what you want to do is use like a list Comprehension turn all the words into their lengths and then sort those no no I still want to get back a list of words But I want to get back that list of words that list of strings sorted by length Right, it's like lining up children in a school by birthday or by height or something Right, so how can I do that? Well, we have the key parameter the sorted function and by the way the sort method as well lists Allows you to define a key parameter and it takes a function and if we then say key equals that it's gonna say Oh, I'm gonna compare not a and b but f of a and f of b and so if I want to sort the words by length I can call sorted with key equals length Let's see that so here I have my words and I'm gonna say print sort of words key equals length Notice I'm passing a function as an argument to the key parameter Right and this is only possible because in Python functions are nouns as well as verbs Functions are objects like just like all other objects Except they have this added bonus you can run them and sure enough now we get our words in increasing order by size So what else can be a fun key function? What can I set it to be actually anything anything can be a key function in Python? So long as it takes a single argument and returns a comparable value returns a value That could be compared with less than and greater them for that matter So here's some examples if I say sort of words key equals length just like we did I could sort the words by length I'm using length to sort the words. I Can also say sort of numbers key equals ABS ABS is the absolute value So I'm gonna sort my numbers by absolute value ignoring whether positive or negative I can say sort of words key equals stir dot lower and here is that case sensitivity that alluded to a little bit earlier Basically, if I run stir dot lower in a string I'm gonna get back a new string which is the same as the old one just all in lower case So this allows me to compare the words without taking case into consideration means a case insensitive Sort now notice we can pass a method by passing as the class attribute So I'm not gonna say s dot lower or word dot lower assuming that s or word would be the instance I have to pass it according to the class Now a really common mistake that I see people make is they put parentheses after the key functions name. They're so You know used to calling functions. So they do something like this. We're gonna say sort of numbers key equals ABS parentheses Don't do this don't do this first of all it won't work because ABS requires an argument and you're getting an error saying There's no argument, but second of all remember that key expects to get a function doesn't expect to get the result of a function So you really want to say sort of numbers key equals ABS just like that and we will indeed get back the numbers sorted in that way What if I want to sort a list of lists? Well, let's say I want to sort them by length Shortest sublist and then biggest sublist sure enough. I could say key equals Len That's right Just like I did with strings because Len works on strings and lists in the same way What if I want to sort by the sum of numbers well I can say key equals some and then what's happening is we're gonna sum up each of these sublists And then we're gonna order them from low sum to highest sum But I can do even cooler things if I write my own Custom key functions as long as my function takes an argument as long as my functions return value is Sortable comparable then we'll be fine and remember the value does not need to be the same type as the input So I can sort strings by their lens because the Len function returns an integer So let's say a few examples of this. So let's say I have a list here of numbers five hundred two thousand one hundred 130 one thousand forty and now I'm gonna say death by digit count of M Now this is going to be my key function This is the function that I'm gonna run on each element and based on its output I will know or you know Tim sort will know how to order things So what are we gonna do we're gonna take a number turn to a string get its length And now I can say print sorted numbers key equals by digit count now I should add something here, which is my personal convention. I invented myself I think hope is to always say buy when it's a key function So I can then say I'm sorting by this or sorting by that you don't have to do that But you know that you'll be really cool if you do obviously your friends will love you as a result So what happens if I do this sure enough? I get 130 4500 100 2000 1000 I've now sorted the numbers by their digit count But you might be saying wait a second numbers are always sorted by their digit count And these are not sorted numerically what the heck is going on? And the answer is that Tim sort is what's known as a stable sort that does not mean it only sorts horses a Stable sort means that if two items in the original list All right, if two times the original list will have the same sort value They will stay in that order in the output list So because 500 came before 100 here It will be before 500 before 100 in the output list and because well, I guess of 2000 came before 1000 here It's also going to be there as well. So you have to remember it's a stable sort normally It's a great thing, but you can be a little confused and surprised here What if I want to sort the sub list by their means right by the new mathematical averages there? So what we can do is we write key function by me and it'll take one list It's going to return the sum divided by the left and now I can print sort numbers key equals by me and sure enough We get them sorted by their means from the lowest mean to the highest mean if you know what I mean All right, what if I want to do something even a little crazier? I want to sort the number of vowels per word So I'm gonna have here words equals this here is a fascinating since lane text text But what am I gonna do? I'm now going to just find a new key function by Val count and I'm gonna ignore the print for a moment Let's talk about the second totally equals zero. We're gonna iterate over each We're gonna iterate over each letter in the word We're gonna take the word through a lowercase or count up the vowels to return that and sure enough We can do this but this print here Why did I put this print here if I put the print there that I will see don't do this in production Right don't do this in a real system, but for debugging and testing It's great because I can see exactly which words are being are being passed to by Val count And so if I run this now, we will see print sorted words key equals by Val count. Look what I get It's gonna show me all the words that were checked and shows that each word was checked once and once only and then basically We'll see that the words are indeed sorted by the number of vowels in there Well, that's sort of file names by final left So I'm gonna get a little help here. I'm gonna use the glob module or the OS module and I'm now gonna get a bunch of What oh, sorry All right, all right, so sort of folks so by file length to file name All right, so what are we gonna do? We're gonna get a file name and we're gonna run OS stat on that to get its size Fantastic and now I'm gonna get all the text files in the current directory I'm gonna sort them by file length and you don't know these files, but I promise you this actually happens to be gase It's from the smallest file to the biggest file I can sort the file names by the files file count where I can use the same sort of thing as I did before But now I'm gonna go through the entire file line by line go through each character each line one by one Count the number of vowels and then we'll see here that I can actually count the or sort the file names by the vowel count Well, let's get even fancier. Well, if I have a list of dictionaries, right? So here are some people have my children and me they enjoy appearing in my slides or so I think and we have first names last names and ages. So can I sort this list of dicks? Nope, can't do that. Why same sort of values we saw before Python does not know how to use less than between instances of dict. It doesn't know how to compare dictionaries in this way But we can use a key function. So let's say I want to sort these people by age I'm gonna define a key function by age of D and then I'll just return the age Value the value sorted with the age key from there. And now if I say print sorted people equals by age Sure enough, I get the list of dictionaries back and they're sorted by age What if I want to sort by multiple keys like last name and then first name? Well, then what I'm gonna do is I'm gonna have the key function return to pull because Python knows how to sort tuples And so what I can do is I can say return by first last return first the last name and then the first name and Sure enough if I now say print sorted people key equals by last first What am I gonna get I'm gonna get the people sorted first by last name then by first name So you'll see here I come first because you know my last name is shorter and then my children all of the same last name But then sorted by their first names Pretty snazzy Now the thing is why should I write a key function? I want to be lazy, right? I can use lambda lambda creates an anonymous function as long as you are passing a function object to the key parameter You're fine. So here I'm gonna use lambda and it's gonna do exactly what I did before It's gonna take a dictionary and it's gonna return a tuple with last and then first so it will sort it exactly at the same way But you know I can do it even better a more pythonic modern approach is to use operator dot item getter And it's a function that returns a function. So if I want to sort people by age I can just say hey p equals operator item getter of age that will return a function that will indeed invoke Square brackets age on each element and thus allows to sort them that way and we can even do better Or differently we can say key equals by first last to sort by last name the first thing Or I could say operator item getter last and first and then it's the first good new last name And then you're gonna do first name exactly what I want Well, what if I have my own class well for one sort objects So here's a class person exactly the same data as before just a class rather than a list of dictionaries And now I have a list of instances of person and if I want to print sort of people what am I gonna get? That's right another error. Why because last then doesn't work on this. How are we gonna change that? We're gonna implement less than that's right. The Dunder LT method allows me to do that and Now I might as well add EQ Dunder EQ to provide for equal equal as well And if I do that now actually I can sort you want to sort by last name first name fine We can do that too But wait a second. What about if I want to sort by I want to check greater than an equal Because we've defined less than and we define EQ What if I try to say people one is greater than equal people zero now? I won't do that because maybe I've defined less than but I've not defined greater than equal. What are you talking about? This is where total ordering comes in folks. I can use From func tools the func tools module I can use the total ordering decorator and total ordering will use LT and use EQ and build out all of the rest of the comparison methods automatically on your class Pretty darn cool. So now if I say hey is people one greater than equal people two. Oh, yeah, that's fine. That's true But why should I work so hard in Python 3.7 we introduced data classes and they reduce our code for creating classes by a lot And they automatically had no sorting watch this From data classes import data class and now here's my class at data class order equals true class person first or last or age it So I need to import data classes and data class I need to apply the decor and need to say order equals true that doesn't happen by default But then it's gonna sort according to these attributes So how's it gonna sort them? Let's take a look here on my people and let's print them out And we're gonna get that's not sorted at all or it is sorted. But how? Guess what it's sorted by first name Then last name than age why because that's the order I specified the attributes in So let's redo this a little bit. Let's say I'm gonna have now the attributes be last Then first than age and of course because the order change I'm gonna need to change how I create my list of people And now when I sort sure enough, it's gonna sort first by last name Then by first name and then by age so it works just great so Python sorted function lets you sort anything and the key Ha ha is the key function Right any function method that takes a single input returns a sort of will output is fair game You can write your own you can use lamb dot you can use operator item getter Classes are sortable all they do is define a few magic methods and if you use data classes It's even easier than that cool We covered a lot of stuff. I'm very happy in my few remaining minutes to answer questions and Let me know Thank you so much. That was uh, it was actually very insightful. I enjoyed that a lot I wish I could get this. Ah, here we go. Here we go. I just have to hit this button There we are Oh, I thought there was noise on the line for a moment. Thank you. That was very impressive So, uh, yeah, I actually do have a couple of questions here and we'll see how many we can get through um, so um One uh one person, uh Vinicius says is sorting on the database faster or more efficient? Oh my god, if you're using a relational database, please use the database to do your sorting. Um, It's gonna have like unless it's a really bad database, but like it's gonna do things For two reasons. First of all, it's implemented in c almost certainly So it's just gonna run faster and second of all, um, it's probably got all sorts of like extra indexing thrown in there But the third reason is you don't want to pull all the data from the database into memory and python Into python data structures and then sort it. Um, it's much better. Just to get it sorted in advance Okay, uh, pascal says awesome. Ruben, uh, do you know if key functions can be used to sort data frames as well? Oh Yes, but not in exactly the same way So you like pan so pandas data frames is sort of like excel spreadsheets Although soon excel will be saying it's sort of like pandas, right? But basically, um, you can sort in all sorts of different ways I know that you can apply functions there. Um, I can't remember if the syntax is exactly the same It's not exactly the same as key functions insorted, but it's really close and similar All right, uh, israel asks how would you put multiple sort options on a class or a data class? Yeah, um I don't think there's a good answer to that. Unfortunately. I mean you would need to really use a key function There and sort of you might be able to have special methods Right, because you can define the methods on as main methods as you want in your class or even a data class But then you would have to specify it. Um, sort of sort is going to assume that whatever the object's less than greater than equal to I mean, I guess I guess if you really wanted to be sort of weird You could have like some sort of attribute that you could switch off how you want to do sorting And then the lteq and so forth would switch off accordingly, but I don't know if I'd recommend that Okay, um, and then actually, um one one more I have uh from somewhere else And if anyone else has any that they want, uh, if they want to, um, ask anything else, uh, there's the, uh, talk sorting anything And you can ask additional questions there and chat with roovin So the last question then is um, and it's a little bit outside the realm of python, but have you heard of dual pivot quicksort I have not But now i'm going to write it down and find out about it Excellent All right, so I think that's all we have so once again, thank you so much