 Good morning, PiconIndia. Thank you so much for inviting me to keynote. I wish we could have done this in person However, it's shortly after 4 a.m. Where I am So give me just a couple of moments. We'll get started with our keynote for this morning. Okay. Let's get started This is what does it all? Really mean we're at PiconIndia. It's Saturday October 3rd 2020 and I'm James Powell If you like this talk, you can follow me on Twitter at Don't use this code I give a lot of talks of similar nature and Hopefully if this is something that you're interested in you'll have a chance to see more of this in future Now I wanted to give you a little bit of context for what this talk is about and some people have always asked me why don't use this code? What does this mean and You might think it's because in past I've given a lot of talks that were kind of gimmicky or frivolous About small little niche details or doing things you're not supposed to do and it is very much the case that don't use this code was I'm intended to be a disclaimer in terms of you know what don't take this too seriously Don't use this code However, originally don't use this code was really more about a particular approach that I was taking in Thinking through particular problems in order to derive actual practical benefit for myself And so to give you a little bit of context around what that procedure was I often do consulting work or Coaching work or training work with people who have to do programming but aren't actually programmers think a Optical engineer or a network engineer or a physical scientist or a data scientist Programming is a part of the work that they do but they never call themselves a programmer as a consequence sometimes they're hesitant to employ a certain degree of technical sophistication or they're even hesitant to focus on certain details because to them They wonder do the details matter. Is this really gaining value? Is this really giving me value in my life? Is it really helping me out to learn all the nitty-gritty details of a language or an API or a tool set? It's just a means to an end for myself And it is very much the case that in for most for a lot of their work the details don't really matter Just make it work get the results you want move on to the next thing But it's also the case that sometimes the details do matter And I think all of you have had a situation where you've seen some code that was pushed to production By somebody who was a little bit sloppy with the details or didn't employ a sufficient degree of technical sophistication And the work had some failure or had some deficiency that led you to have to go and fix it later Or to rewrite it from scratch And so we know that it's not quite the case the details strictly don't matter the strictly do matter What I want to show you is a particular approach that I take where I try to motivate meaning Through investigation of the details and this is really the ethos of why I started with don't use this code The whole idea is this if you take a look a really really close look at the details And you go as deep as you possibly can on the way down and on the way back up You're going to find meaning along the way and that meaning is not going to be gimmicky or niche or Inapplicable to back to problems It's going to be something that gives you actual value and actual things that you solve or actual things That you have to deal with in your daily work In order to really show you what I mean I want to start with an example and we're going to presume that we're doing some kind of code reviews Well, this is a code review example We've run across a little bit of code and there's some details in that code And in fact the details in this code are very very very small I want to show you how these tiny details these almost trivial details can actually give quite a significant amount of meaning and understanding to the work That might not be visible otherwise along the way I want to kind of introduce you to the thinking process that I take But let's get started. So let's say that you run across a line of code that looks like this somewhere in a code review There's not a whole lot you can say about this. It's very simple. Okay, maybe you could complain about the variable name But beyond that there's really not anything that's interesting there But what's interesting here is instead of writing X's as a list of the numbers one two and three Somebody could have chosen to write it as a set of the numbers one two and three Somebody could have chosen to write it as a tuple of the numbers one two and three Somebody could have chosen to use a numpy and d array in order to store this data Somebody somebody could have chosen to use a pandas series They could have chosen to use a pandas array if they were particularly clever and using an up-to-date version of pandas Or they might have even chosen to use a pandas data frame, but it's likely in this particular case They probably this is a little bit gratuitous Now what's the difference between these? Why did somebody make one choice? Does it really matter who cares generally who cares? Why did one of the choices that somebody made to use a set here versus a list versus a tuple make any difference to anybody's life? well If you think about it if the goal here was to store a couple of numbers and then you just compute something on them Like their sum You store them in the list the sum is six you store them in a set the sum is six You store them in a tuple the sum is still six You store them in a numpy and d array the sum is still six and you just have two ways to compute the sum This api and the numpy api You store them in a pandas series again the sum is still six You put them in a pandas array Surprise the sum is still six you put them into a data frame the sum is still six With the small caveat that this doesn't actually work You have to write this instead But this is three characters different and their characters that you'd find the difference between these two You'd look it up on stack overflow. You'd ask somebody to help you out What does it really matter who genuinely cares? Why does anybody care the difference between these types? This is a very small very minor detail in a very small question But as we go through this exercise I want you to see how this question can drive an enormous amount of meaning and understanding For what python is about and what these tools are about. So let's get started Now our goal here is to determine what does this all mean? And not only what does it all mean? But what does it all mean in the context of things that will actually help us write better code That'll help us improve our lives in some measurable way What we're going to do is we're going to embark upon a thinking exercise And this is the thinking exercise that I employ as part of all the talks that I give And as part of all the consulting work that I do all the training work all the coaching work that I do Now you might look at this term thinking exercise and wonder well, why does it think thinking exercise? Why doesn't he say thought experiment? Hold on a moment. This isn't so fancy that we call it a thought experiment. It's just the thinking exercise So the very first place we'd start is let's just compare some of these options one on one Let's compare the option where they encode this data as a list and they code this data as a set And we'll make this example even simpler. We'll just have an empty list and an empty set And so here we have an empty list here. We have an empty set. What's the difference? It's two characters different slightly different in terms of I do or do not need the shift key on my keyboard They're actually the same key on this particular keyboard That doesn't seem to be anything particularly interesting here And it doesn't seem to be a detail that if you were to talk to somebody who's not a programmer They're going to really really care. Oh use the square bracket. It's not the curly braces Now one thing to clarify. This is not actually a list or a set. I hope you all were able to catch it This is actually an empty list an empty dictionary This is an empty list and this is an empty set. I hope you all caught that Now going back to our example, we might say well the difference between these two is the list is a sequence and a collection whereas the set is a set and one interesting thing that we're going to see in a moment is What's actually meaningful here about the list is not that it's a collection with the capital C But that it's a collection with the lower kc And when you look at that, this is an even smaller difference than the choice of punctuation that i'm using in order to create this type Whether it's a capital C or a lower kc. This is a very small distinction and yet it is surprisingly meaningful Now what do we mean when we say that this is a sequence? Well, we might know that in the collections module in the standard library and the abc sub module There's a capital s sequence object and we can import it and we can say is a list an instance of a sequence And python will say yes a list is an instance of a sequence or this particular list is an instance of a sequence We could do the same thing with the set. We could say is this set an instance of this Collections dot abc dot set that doesn't seem to be particularly interesting It seems to be a formality It seems to be something that somebody might care about at the edges But we still haven't motivated why it matters that this is once a set once a sequence We haven't motivated. How does this change my life? How does this help me do anything that I want to do better? We've just added in some jargon in some vocabulary and in fact If we try to take this a little bit further and we say well, hold on a second It's not just that it's a sequence with the capital s It's that because it's a sequence it supports these indexing operations You have this get item protocol You can say give me the zeroth element the first element the second element And in fact, you can even do some kind of subsetting of this data You can say give me a slice from the zeroth up to but not including the first element That's a little bit more interesting. That's an operation that you can perform on one of these but you can't perform on the other But again somebody who knows a little bit of python might look at that and say Well, you know a dictionary is not a sequence, but I can also perform that square bracket indexing So the the core meaning here is not that you can do a square bracket after this That you can use the get item protocol There's something more here because it's not just to get item protocol That's interesting one as being a sequence and the other one as being a non sequence We could even maybe argue that if somebody somewhere had chosen to make slicing Slicing objects hashable. We could have even do some we could even do something very similar Visually and formally in terms of the syntax to the slicing operation on the dictionary Unfortunately, this doesn't actually work because the slice object isn't hashable And so you can't look up the key of the slice object in this dictionary But ultimately we still haven't gone to anything that's actually helping us in our day-to-day lives Now if we look closely at the sequence object, we can see it has a register function And we could even tell python to register the dictionary as a sequence And then when we ask python is a dictionary a sequence, it'll say yes And so this sequence object and collections at apc really isn't that particularly interesting It's answering a question for us that we might ask at some point once we have already determined Why is it interesting? Which of these two choices somebody has made but just that we can arbitrarily register something just because it has some particular syntax Isn't quite getting to what the what the distinction is here If we wanted to go even further we could even say well, you know what? I can see that there's a sequence and a mutable sequence We'll talk about this a little bit later and we could register the dictionary as both the sequence and a mutable sequence at the same time But again, we're not really getting to the core of what this actually is in order to drive something that is a practical benefit With the set we can try the same thought experiment We can try the same exercise, but we'll see that the answer is a little bit closer to the surface We can ask this set object. Are you a set? It'll say yes We can say what does it mean to ask this thing as a set? Obviously it means that it's been registered as a set But setting that aside we can say Maybe it kind of means that you have this ability to perform certain operations on this This operation encoded by the ampersand the pipe the carrot and the hyphen And it's not just that you can apply this particular syntax Certain things in the collections at abc module like callable really just mean can you apply the open close parenthesis syntax after the object And you can see the syntax itself is not what's interesting here because you know what I can Ampersand pipe carrot and hyphen and integer as well, but it's not a set It's that when I do the ampersand pipe carrot and hyphen I actually mean something I by ampersand I mean find me the elements in common do a set intersection by pipe I mean find me the find me all the elements the set union By the carrot I mean find me the unique elements on one side of the other the set symmetric difference By the hyphen I mean find me the set difference take all the elements of this and subtract the elements of this Find the set difference and here we can see oh is something a set that's interesting Well, it's kind of akin to a mathematical set I can perform set like operations in other words if I have some data And I need to figure out from that data. What are the elements in common? What are the elements that are unique? What are the elements in one but not in the other and vice versa? Oh, I probably want to use a set that's actually valuable to my life because if I see a problem of that nature Then I know what I want to choose, but we're still not quite sure why this list is interesting as a sequence By the way when we talk about the set we can also say that there's another notion here of oh There's unique elements and with the list we don't have this guarantee that oh the elements will be unique And so if it were the case that we had some problem and we wanted to say I don't want to worry about duplicates automatically allied duplicates the set type would be very valuable for us but The real question here is is this meaningful and I think it I think it is meaningful that we we Typify this thing as being a set we can see that there are certain operations that mean something to a human being That help a human being solve a human problem They want to solve that help a human being model a data in a particular way They want to model but we're not quite sure how that applies to the notion of the list being a sequence Well, if we step back a bit and we separate ourselves from the notion of the sequence just being the indexing operation And we think about this for a moment. We can say The set isn't a sequence, but the list is Well, one of the things that's implicit about being a sequence What makes the list a sequence but not the dictionary is that it has an ordering A human ordering an ordering that a human being cares about And just like I can say I have some data and some problem Or my problem is to find the common elements the disjoint elements Maybe I have some data and what's important to model that data is when the data arrived when the data will be serviced Whether this is first in first out last in first out And so a sequence or an ordering is an interesting property that may or may not exist in my data If I need to model for example Something that you might call a queue Then maybe a list might be valuable or maybe even a collections dot deck if I need to model something akin to a stack Maybe a list type is valuable because that ordering principle that I get from it being a sequence Is actually what I'm looking for and it's surprising in my experience working with semi technical and non-technical users A lot of semi technical and non-technical users actually very much understand notions of oh This is a first in first out or a last in first out algorithm And so when you tell them, oh, we'll use a set here Or sorry, we'll use a list here because we would need that sequencing behavior It's something that I might be able to understand and appreciate And so I would argue that the list being a sequence if we dig a little bit deeper is actually something that's meaningful Now the distinction here is not super meaningful. They're not particularly that common If they're not particularly that similar the list in the set But you can see that this comparative exercise we can try and tease apart some differences here If we want to really drive for a little bit more meaning and we want to make this even more meaningful Let's take a look at another distinction. Let's try and distinguish the list versus the tuple now I the reason one of the reasons I bring this up and I add this this example here One or two reasons the first is this I often ask as a benchmark to try and get an idea Whether somebody really has a clue when they're using python I asked them what's the difference between a list and a tuple in fact I ask them a three or four part question What's the difference between a list a tuple and a numpy and d array? And I try and get a sense for what their answer is and from their answer You can kind of see are they stuck at the surface details? Or do they really understand what these things are and are they able to Employ these tools in a fluent way and you can see syntactically the distinction is quite minor One has square brackets one doesn't I could put optional parentheses around this tuple But it's not a syntactical distinction Oftentimes when I ask people what's it between a list and a tuple? I get some trivial answers about some behaviors of them But let's try and go through the exercise in a similar fashion as before If we take a look at the list and the tuple we might say they're both sequences and they're both collections They both have indexing operations available to them. They both have some kind of ordering They can both be sliced They're both capital c collections and I told you before you know I actually care a little bit more about lowercase c collection Oh, by the way, if we check this using the collections at abc module we can see You know, they're both sequences and collections And if we ask do they do they support the syntax that we want? Can they be indexed? Can they be sliced? They both can be indexed and can sliced But one thing that we might notice if we go back to this notion of the sequence versus mutable sequence It is very much the case that the list is a mutable sequence But the tuple is not a mutable sequence And so here we might have teased apart something that might appear to be a difference One is mutable and one is immutable and we might say that's the meaning That's the choice I'll make a list in a tuple. They're basically the same thing except one could be changed And the other one can't be changed. So if I need the data to be mutable I use a list if I need a data to be immutable I use a tuple and I think that's missing a point. I think that's missing the point I think the distinction between oh, we have this list. We can change the elements in place We have this tuple. We can't change the elements in place It's an important difference and it's definitely something that will affect the code that you write It's not completely trivial, but it doesn't really get to the meaning because we haven't gone deep enough And so let's go a little bit deeper and let's think a little bit deeper when we talk about mutating a list If we look around and we look at how we mutate a list One of the most common ways we mutate a list is not by changing individual elements So much as adding elements or removing elements appending elements to the end Now if we were to do a list append operation and it looked like this and we were to Slightly change our syntax and say somebody has written a function called append and that exists somewhere and that's what appends to the end of the list I don't think there's a really major difference between these two These two syntax very minor difference now if we were to take the syntax and extend it very slightly By just doing one more assignment and then actually implement append we see something very interesting here Here we have a list, but we're treating this list as though it were an immutable type And nothing really major about how this code might be Subsequently used has been affected. There may be some early versus late binding live versus snapshot view changes That might affect us because fundamentally people might expect the list to be mutable And they might expect multiple references to all be updated in sync Versus a tuple being immutable and so every time you need to perform some transformation You're making a copy, but if you think about it and you Try and extend your thinking beyond just the limits of python You might say well, hold on a second I also write a little bit of javascript because you know I need to create a front end for the data science reports that I create and I and I Like to use immutable js and javascript and an immutable js They also have a list type But immutable js are all immutable types and you can see you know what immutable js probably kind of works like this What makes the list the list what makes this list a meaningful thing Is not really the mutability versus immutability It's got to be something else. It's definitely an important characteristic And it's definitely something that affects your program, but there's something more here Now if we were to try to play the same game with the tuple This is the second reason I wanted to show this to you. We could think about this a little bit Fundamentally the tuple is some memory that sits somewhere that had to be allocated by somebody And the memory was uninitialized at the beginning and you had to put the elements into it And so it stands to reason that somewhere inside the c python source code. There's a way to mutate a tuple It's necessary in order to create the tuple in the first place And it turns out that happens to be the function py tuple set item and it turns out if you look at py tuple set item It's actually Exportable because third-party extension modules may also need you to create tuple types to interact with other python code And so as a consequence you can think well, you know what it's actually quite trivial to mutate a tuple Just write a c extension module or write some code in cython call py tuple set item and change the tuple in place But you might back up there and say you know what this is actually Quite trivial you're talking about an implementation detail And we can't really derive that much meaning from implementation details because those details could change over time Who knows maybe some initiative to simplify the c python c api Might come into play and we no longer have access to something like py tuple set item It's entirely possible and with that fundamentally change the meaning of the code We've already written i'm not sure now If we were to take this example a little bit further and we're trying to think well Can I find a way to make that tuple mutable? And can I do that in a way that drives additional meaning? We might say you know what i'll define a function and in a function I'll do an assignment and what i'll do is i'll look at the disassembly for that function I want to look at the disassembly for that function I'll see it looks something like this you load a constant the ellipsis object You store it to the x value you load the constant none and you return that value This function just happens to return nothing and if you squint a little bit and you zoom in a bit You might look at the store fast instruction. You might wonder what does that thing do? So you might go a little bit deeper and you might say well in the c evaluator in c eval.c the python main loop We have this Implementation for the bytecode that stores the thing and you might squint at this again and say what's interesting about this Is the set local operation and it turns out that set local operation is a macro to c macro And if you squint at that a little bit harder you might say well That's actually calling the get local c macro and if you look at the get local c macro you can see There's something in the c python Implementation called fast locals and it's doing some kind of array lookup So what it looks like is whenever you're trying to store a variable in python In the end it turns into an array access in c but here's something interesting There's no bounce checks Because there's no bounce checks what that means is if you're doing a store fast and you can create bytecode You can tell it to write to that array Anywhere and it doesn't even check that the index is non negative meaning store fast That's an easy way for you to do arbitrary memory access What that means is if you could construct some poisoned bytecode and in that poisoned bytecode You could say don't store fast to an actual place for an actual variable is but compute some offset The last is to write to arbitrary memory and that arbitrary memory happens to be a tuple He just made tuples mutable now I would share the proof of concept of this we wrote a proof of concept of this a couple of years back It's way too big to put onto the screen it involves Splaying generators into your heap until you find one that's at the right distance from the thing And you are rather displaying coroutines at your heap with the right distance It's enormously complicated in order to create it And we're still investigating ways to just take this stored fast and use code type in order to create poison bytecode There's at least one proof of concept of this in play now setting that proof of concept aside We might say again, this is really weird. It's really niche. This does not seem to benefit my life I just want to write some code that works You're going to tell me that it was from your list in a tuple and you were gonna you promise me That it was something that would be valuable to my life. You haven't you haven't made good on that promise yet Well, let me see if I can come up with a different way and here you can see why I wanted to talk about list versus tuple I secretly wanted to show you a couple of different ways to mutate tuples. Let's talk about way number three We know there's a library called numpy and numpy provides you a data type called a numpy and d array And we might not be sure what that numpy and d array type is In fact, that's one of the choices that we had and how to model this data Because that's one of the choices that we had in modeling this data It's going to be important for us to try and figure out How is this thing different from the types we've seen so far Now if we dig around a little bit in numpy, we might say well numpy not only provides this type But it has a library. It has things like linear algebra operations. If we dig around a little bit it has Striped tricks and inside striped tricks. There's a function called as strided and it's still not quite clear We're not even sure what the numpy and d array is. We certainly don't know what as strided means But our fundamental goal here is to mutate the tuple. Why not just because it's fun Not just because it's gimmicky but because we're going to see something very interesting here If we were to try and mutate a tuple, we might try and write a function called tuple set item And it might take a tuple and index and a value and what we're going to do is why don't we create a numpy and empty numpy and d array And we create the numpy and d array. We might say Tell numpy what you're doing is you're creating an empty array and what's contained in this array are Uint 64 8 byte structures now It might be interesting for us to think a little bit of what the numpy is Well, numpy is actually a way for us to take a memory view a view of memory and to look at it in different ways And so it's important for numpy to know is this a uint 64. What's the size of this thing? What's the strides? What's the dimensionality of this thing and that's what astride it does It lets you tell numpy. Oh this block of memory. Just look a little bit differently This is akin to something like a ccast. You're not making a copy of some data. You're just saying Oh And in 64 It's actually eight in eights And you can do that with numpy because numpy is just this notion of take some arbitrary block of memory and look at Operations on it And so what you can do is you can go into numpy. You could say hey numpy in your array interface Tell me where the actual data for this empty array would be stored What's that memory address and it turns out that when we're talking about memory addresses We have another way to look up memory addresses The id function in python is a way at least in c python up to today Is a way that you can get the memory address of an object Now Does the id function mean give me the memory address of the object? No, it doesn't and I can prove that to you very simply Alternate python implementations like iron python use a monotonic counter for the id and so id meaning memory address is not correct id means unique identifier It just happens to be the case that in c python id means memory address Now there's one other interesting thing here If you know a little bit about numpy and a little bit about python you might know that python is going to heap allocate your tuples you might know that numpy uses raw malloc not py malloc You might be able to guess That any python any numpy and d array is going to be allocated in lower memory addresses than any python tuples Meaning if you look at the distance between where this tuple is and where that numpy and d array is It's always going to be a positive number Meaning you can go to numpy and say you know that empty array. I was wrong It's not an empty array and it's not an empty array of eight byte in of uh eight byte ins It's actually single byte values and its size is exactly where In low memory the array was all the way up to in high memory where the tuple was uh, oh you told numpy Now I own all the memory in between these two and you can tell numpy. You know what I was I want you to give me a new Numpy and d array called wise and what that is is it's a little offset To write at the beginning of that tuple object that you found in memory That's just a size four That's a bunch of eight byte elements and why eight byte elements because likely this tuple Object as it's stored and represented in c python has a couple of ins a couple of pointers things like that You might also say numpy give me a zeds and what zeds is is it's an offset into this tuple object This raw memory layout for this tuple object and it's the size of Oh, it's the size of the actual data that tuple stores because the tuple object actually stores It's it's size and the actual underlying references to the things that it contains all in one in one case You use memory block When you do that well, it's easy peasy you tell you tell python or rather you tell python via numpy Take that little memory address where you store the size of tuple change that take that memory address where the actual underlying Id's the actual underlying references are add something else in there And when you have this in place you can take a tuple that has a little gap in it It should be zero one two none four And you can mutate a tuple using numpy How about that Now if you take a look at all three of these they're all very gimmicky. They're all very niche. They're all dependent on implementation details This last one is the most dependent on implementation details despite being surprisingly safe and easy to do in part because unfortunately not a lot of people use iron python everybody's using c python And in part because numpy is available in all platforms and some of the assumptions that we made along the way are fairly Are surprisingly common and safe assumptions to make but Even though none of these individually are particularly compelling I think when you take them in combination, you can say this mutability versus immutability is not altogether That interesting I can make a list immutable or I can treat it as though it's an immutable type I can make a tuple mutable But i'm not really really changing how i'm using this thing. I'm not changing what this thing means It is the case that one is a sequence and one is a mutable sequence And it is the case that there are implications to this for example If you need to store One of these as the keys of a dictionary Well, the keys of a dictionary have to be something that can be hashable and there is a relationship between the between mutability and hashability And it may be the case that if you need to have some kind of structured key for your dictionary Your choice is a tuple and a tuple only even if you wanted to use a list you couldn't But I want to get to A deeper meaning and the deeper meaning relates to this capital C versus lower kc I told you that there's a both capital C collections, but I told you that the list is actually Interestingly a lower kc collection. That's where the meaning is and so the question would be what is the tuple? It's not a lower kc collection. It's something else. It's a record. Let me show you what I mean by that When we think about the thing that we do to a list most often, what do we do? We append to it. We pop from it. Well, what that means is if you gave me some data and that data happened to be a list I might not know how many times you appended and popped from it. I might know not the size of the thing And what I'm going to typically do with this. I'm going to do a for loop I'm going to iterate over every element and I'm going to perform some operation on them Well, because I'm going to perform the same operation f on every single element Every element of this to of this list should support that operation As a consequence of that I can kind of think It's also important that if there's no elements this code doesn't break If there are many elements this code just runs over all of them And as a consequence of that if I try and compare this to how I conventionally used the tuple I might say, you know what? I don't really loop over a tuple in a for loop I'm usually unpacking a tuple and I'm doing different things with the different parts of that tuple You can see in both cases. They're both sequences. They're both ordered. In fact, it turns out even the set is ordered The distinction that's important between the list and the set was not that one was ordered and the other one Was unordered If you were to iterate over the set multiple times, you'd find the elements come out in the same order You might not be able to predict it The difference between the list and the set was that the list Was not machine ordered. The set was The set had some ordering that facilitated fast lookup operations that was decided by the machine I decided by the ordinal value of the tash value a subject to the open addressing policy of the of the you know probing the perturbative probing hash table implementation or the split hash table implementation Sorry, the non-split table implementation since split table was only added in python 3.6 for dictionaries alone But what was interesting about the list was not that it was ordered versus unordered But that it was human order it had a human order And it happens to be the case that both the tuple and the list have a human ordering It's just that we're using that human ordering differently We're looping over it in one case and we're unpacking another case Well, if we try and split that difference and we look in a little zoom we zoom in a little closer We could say a consequence of this is going to be that in the list adjacent elements Are semantically similar or conceptually similar because we're performing the same operation on everything that's contained in that list They all have to kind of be the same thing, right? They all have to be a bunch of numbers a bunch of personnel records a bunch of components But in the tuple we unpack and do different things with each of the components And so they're semantically or conceptually dissimilar And if we think about a python collection type the capital C collection type It is the case that python capital C collections are always capable of being heterogeneous in type And even going back before the pep 484 days You know, we were always a little bit loose about you know, what the type of something was what does that mean? Well, it turns out that if you have a list and you're performing the same operation on every element there It's very likely that every element has to be homogeneous in type sort of Even with the pep 44 work We still consider say an int to be interchangeable with the float or interchangeable with the complex or interchangeable with the bool But we generally kind of say, you know what everything in the list is semantically or conceptually similar And so that's how we decide this notion of homogeneity Whereas in the tuple we say everything is structurally conceptually semantically dissimilar And so we'd say this also would typically lead to heterogeneity in terms of the types that are associated with what's stored in that And so when we look at this again, we might say, okay That gives us a notion the list is a collection with the lower kc It's just a bag of stuff It's a bag of all kind of the same stuff And it's important Which is the first thing and which is the last thing and it's important the exact ordering of the things But there's not a fundamental difference between the first element and the last element Whereas the tuple is a record It's a bunch of fields It's very important what the first element and the last element is because the position indicates what the thing is The first element has some particular meaning that's not applicable to the second or third element or may not be applicable to other elements It is very much the case that as we said before the tuple exists at this immutable type in order to be used in a Dictionary it is very much the case that mutable versus immutable is important But it's not the fundamental meaning here the fundamental meaning here is one is a collection type and one is a record Now I wish we had a little bit more time for this presentation because we could talk in greater depth about this hashability immutability thing This is another area where I see sometimes There's a little bit of confusion oftentimes. I see people say a hashability implies immutability or immutability implies hashability and it actually turns out to be the case that hashability Strongly suggests immutability Assuming one criteria, which is you need some kind of not random or not intermediated access And it turns out that you make mutable objects hashable Very often in cases where the lookup has some intermediation a very common example would be in a network x digraph When you're never really indexing you're never doing a get item into the structure directly you're always doing some kind of Some kind of intermediated access to the elements via You know graph dot nodes or graph dot edges. You're asking the object itself to enumerate what it's contained within it You're not saying. Oh, just give me this particular element randomly But unfortunately, we don't have time for that So we'll go back to our example and we'll talk about this list this set in this tuple And let's make this example a little bit more specific a little bit more concrete Let's say that we're storing not just numbers. We're storing host names If we were to intentionally make a choice between the list this set and a tuple We could convey a lot of meaning in that and there could be a lot of meaning behind this There could be something really there in terms of the choice that we make It's obviously the case that we could choose one of them Maybe the code might work might not work and it might not affect the underlying functionality But if we were to make this choice intentionally, there might be something there Let's see what that might be So let's take a look at what differences might occur if we were to So let's take a look at what differences we might see if we were to make one of these three choices If we were to choose the list formulation one of the things that we might try to convey to somebody is We care about connecting to each of these machines in a particular order that ordering is important Which machine you connect to first and which machine you connect to last Is very important now in terms of the difference between the machine There might be some modality hidden in there There may be some predicate you perform this operation on this machine versus that operation on that machine But fundamentally, they should all be mostly similar If you think about the set formulation, you're basically saying I don't care about the ordering I care about connecting to a machine if there is a duplicate. I only want to connect to the machine once Make sure you connect to the machine But there's no real difference between if you connect to the machine first or last Additionally, if you use the set formulation, you may implicitly be saying that you know what I've got a bunch of different host names What are the ones in common? What are the ones that are not in common perform some set like operations and you can see This choice even though it's a very small one and it's driven around a very small detail Really closely ties itself to how you go about solving this problem And what you'll be able to easily do Using the using the structure that you've chosen now If you choose the tuple as we saw the tuple is a different type of ordered structuring It is human order just like the list, but the human ordering has some notion of the different elements being Contextually being contextually conceptually semantically distinct In other words, it's a record not a collection And so here if you had these two host names you might be saying well You know what this host name is the prod host name This host name is the dev host name or you might even be saying This is the primary host name that you connect to and this is the backup host name And you may actually do fundamentally different operations There are certain things that you do in prod that you wouldn't do in dev There's certain operations you might do on the primary and not do on the backup Now if you think about it with this primary backup example, you could also model this as a list But there's a meaningful difference here The meaningful difference is if it happened to be that there were n backups There's some primary machine at the very beginning and then if you don't hit it You hit the next one the next one the next one That's probably a list But if it's strictly the case that there is a fixed modality of either primary or backup And you're guaranteed that everybody has either a primary or a backup And that there's a very stark kind of discrete difference between these two As opposed to the continuous modeling of the list Well, maybe that's a tuple versus a list Now what you can see here is that the details are very interesting Because we can dive very deep into the details And we can use the meaning that we get out of them to clarify a choice We can use these details in order to convey an intention And so it's very important that we say if you were to make this choice intentionally Then this is the result because it's very much the case that you might not be intentionally choosing between these three choices And your code might still kind of work It turns out the machine doesn't really care about this meaning It's the human being that cares about this meaning This meaning is valuable in terms of what it conveys to somebody else and also in terms of How it helps guide you in the work that you're doing But ultimately The choice that you make may not actually make that much of a difference in terms of what's executed by the machine Additionally, it's very important that when we talk about things like this list versus tuple view This is an interpretation. This is my interpretation I find that the deeper you go into thinking about the list and the tuple type and the set type and all of the built-in types This interpretation really holds up. There are other places where this interpretation is further corroborated But ultimately it's interpretation and it could very well be the case that you choose to just look at a list as a Immutable or sorry a mutable tuple however I think that you lose a lot of fidelity in terms of what you can convey to somebody and I think you Fall you find yourself going astray from how these things are typically used in practice Now that was a view of the thinking exercise this don't use this code exercise You go deep deep deep into something and then you on your way back out try and figure out How is this meaningful? Why does this matter? I want to complete this exercise for you and I want to talk about the other choices that we had And in going through the other choices I want to give you a conceptualization of how all the pieces of numpy and pandas fit together and we're going to try and follow the same exercise And the same steps that we did before and those steps are collect the facts collect the details Look at as many deep niche details as you can Find differences look for similarities But then but then review conventions and review expectations an example of an expectation versus a convention would be convention might be How does somebody actually use this an expectation might be When somebody looks at this, what are they really thinking? Yeah, maybe a difference is here But does somebody even notice that here's an example There are multiple import styles. You can say import x You can say from x import y you can say import x has some alias And when you look at those three for the majority of python programmers They don't really see a strong distinction between those and so there is a fundamental distinction Import x versus from x import y is an early versus late binding distinction when you create that name Are you creating that as a live view or a snapshot view or rather when you access the thing by that name Are you accessing it as a live view or a snapshot view? Is it early bound or is it late bound? Most python programmers don't really see that distinction and so their expectation is not going to be This is a very important distinction And so for the most part when you're trying to do things like compare import styles It's really going to come down to how much typing and if i'm a data scientist You know what other two-letter alias can I find for the new package so that I can minimize the amount of typing that I have to do Collecting all of these pieces you need to apply some judgment And this is a very difficult thing to do because you have to really sit back and say what can I justify Can I put together a convincing argument for this? You might be right. You might be wrong There may be a case where there is no right or wrong And fun and finally you derive some interpretation which leads directly to some meaning The reason that you want to do this is you want to be able to look at something and say what is this thing really What does this thing really mean? And so let's take a look at all of these pieces of numpy and pandas and let's figure out what are these things really What do they really mean? We've seen the numpy and dra already and we talked about it as a memory view We said it's one way for us to access some raw memory And we said that numpy has the ability to do something akin to a ccast You can look at that memory differently So you can say oh this memory is one linear sequence of nine elements or you can say oh, it's actually a three by three matrix You can look at that and say oh, you know what? It's a bunch of int 64. No, it's actually eight times as many in eights That's kind of what the numpy and dra is but the meaning goes even deeper than that Because when you think about it versus the list, it's a sequence just like the list In fact, it's a mutable sequence just like the list And when we try and drive a difference between the two of them We might say well the list is dynamically sized and the numpy and dra is fixed size But you can make arguments for well the numpy and dra is fixed size because it represents some raw memory at location and You're not really going to shrink or grow that that might require a new allocation But the list is some kind of reference of Some some grouping of some references of objects and so yeah, I couldn't see there's a difference here But we're still not quite getting to a core meaning We could say that The list has a fixed shape. It's always some linear sequence where the numpy and dra has a dynamic shape It could be linear could be two-dimensional could be three-dimensional We could even say you know what? The list can only actually ever be linear and the numpy and dra can be any number of dimensionality such that When you multiply out the size of the dimensions you get the total size of the thing that you're looking at But there's something a little bit deeper than that and if you look at what you put into the list And what you put into the end dra and you think about what those things are and how python works You'll see a very interesting distinction up here Let's say we have a list containing some integer objects and what is an integer on python? Well, it's some It's not like an integer in crc plus plus because it can't overflow. It doesn't have a bit width It Doesn't have a sideness. It's not signed or unsigned if I Put some integers into a list and I operate on all of them I kind of expect each one of these integers to perform the same operation But we don't really but I don't really carefully look at the homogeneity of that thing Because you know if there were two integers and a floating point value I might very well expect to be able to add one to all of them And when I look at that integer I can see, you know, it has this nice auto promotion behavior It's what I might think of as a boxed type Unlike in a language like java where you have boxed versus unboxed types and python everything is boxed What that means is the list doesn't store the actual underlying data source references the data Therefore the list is non contiguous As a consequence operations on that list might need to jump around memory And so they can't benefit from cache coherence Additionally because it's a box type that box type can have behavior associated with it So that might not be a list of integers It might be a list of subclasses of integers And when you perform an ad operation on one of those integers It might perform some stateful operation to mutate another one And so the actual underlying behavior of these things is unconstrained This this dynamic dispatch has to happen at runtime and what it actually does there's no limits to that What does that mean? Well, it means that if I have a list that I'm processing the list Can I auto parallelize the processing of the list? No Because the things that are contained in the list Have some arbitrary unconstrained behavior and as a consequence processing the list from front to the back Versus from back to the front might be meaningfully different However, if I think about the numpy and darray, what do I typically put into numpy and darray? It is the case that I could put pi objects in numpy and darray if I weren't worried about things like I don't know potential memory leaks from circular references There's some long-standing bugs related to this in numpy And it is the case that there are places where you often do put python objects into numpy and darrays For example, you know a pandas interval might show up in an index So it might end up showing up in an darray, but ultimately when you're usually using a numpy and darray directly What are you putting into it? You're not putting into an integer. You're putting into it an n64 Well an n64 is an unboxed type because the memory for that n64 is managed by the numpy and darray Therefore that memory is contiguous therefore If you want to perform some operation on an ndarray you get cache coherence And because it's a machine type it has constrained behavior. It's an n64 You add one to it. You know exactly what's going to happen. You can't subclass it. There's no notion of any type system there It's just some bits that have some operation that's understandable by the computer and can be represented without dynamic dispatch as Probably a single, you know low cycle count assembly instruction When you operate in the numpy and darray you can operate or not just like the python list You can go through each element and apply an operation, but you typically don't do that You typically ask the ndarray to perform the operation for you And so you have this distinction in syntax a very minor distinction in syntax But what that immediately leads to is a notion that the numpy and darray Beyond just being some memory view is actually a restricted computation domain It's a way where you can come to terms with the fundamental inefficiency of python python is too dynamic and It's simply too dynamic and it has certain limitations around Uh No ability to control memory layout And as a consequence if you need to get certain optimizations out of python What you do is you take computationally intensive parts of your program You draw a line around them. You call that your computation domain. You build a manager type That's what numpy and darray is that manages everything inside that instead of Allocating a bunch of python objects You have the manager type allocate raw memory and then manage it itself and box and unbox on the boundaries And because you have control over that domain you add some restrictions to do things faster to eliminate dynamic dispatch To add in optimizations That's what numpy and darray is it's some computational domain sitting Within a bunch of python code that does program structuring So when you think about that you can say well, hold on a second If I need to store a bunch of numbers and those numbers are being stored for the For for in order to be able to do some mathematical work Some computational work put into numpy and darray But if I'm doing that for some kind of program structuring like I'm printing them to the screen Or they're deciding some mode for what I do here or there Probably put them into a python list now We can make this we can draw this distinction even even clearer We could say that if a list versus a tuple is a collection versus a record a list versus an darray Is a collection just some opaque bag of stuff Versus a mathematical vector or matrix or tensor or something along the lines of that And so you can think list is for storing a bunch of stuff numpy and darray is for storing some mathematical stuff Now if we think about pandas arrays versus numpy and darray's we can start to create a conceptualization of what pandas is all about If we think about a numpy and darray that stores floating point values We could store three values here and each of these are valid values We've taken a measurement one two and three now we can store another thing that's not quite a number And you we might say this is in a value a nan And this means we took three measurements and the third measurement was not applicable was missing was erroneous in some fashion So we have two actual values and one error condition Here we have a modality The data that we're storing can either be of this class or that class It can either be a value or an error condition can either be one two or a nan And we have to find some way to encode that Well when we're encoding the actual values the one and the two we're using i triple e 754 double precision binary floating point And if we were to try and code the nan we'd find that i triple e 754 The bit patterns of how it's stored on disk reserve certain bit patterns for things like infinities and things like nans So if we're talking about a double precision, but a single precision i triple e 754 floating point type This would be the bit pattern for a nan And this is not a valid value. This is a nan at some sign bit and then a pattern of a bunch of ones and then some payload it's surprising How few applications use the sign bit or the payload There's actually very few bits that identify the nan and a bunch of bits that you can encode a bunch of other stuff into Now if you think about encoding this modality, how would you do the same thing in integer? Well, what integer value what bit pattern would you choose in order to encode an error? Well, you can't choose zero because that would be ambiguous with an actual zero You can't choose negative one because it would be a big use of the negative one If you chose like the highest value Then you reduce the range of your integer and historically integers have had very limited range You know 32 bit integer can only get so big a 16 bit integer can only get so big And you really might want that range and also historically people did not encode inside the integer type any Bit patterns that were reserved for anything but values themselves So as a consequence if you need to actually work with real code and real data that somebody has already Encoded and written for you You can't do with anything with integer and so what happens is If you happen to put a nan into a numpy and d array that stores integers numpy Promotes everything to float 64 and that might not be what you want And that might actually result in certain problems accuracy precision problems that you have Well, if you think about this appendage array can store an n a type while still being an integer array And the way that it does it is it doesn't encode the modality into the type itself And it codes the modality out of band it stores a mask and the mask is This is a false value or true value Is this a nan or is this not a nan and it stores the data separately And if you look at that data that data happens to actually be a numpy and d array And so you can see it's just an indirection on top of the numpy and d array to allow you to store out of band information What other out of band information other than these modalities might might you want to store? Well a very common case for the num for the pandas array would be categorical You're not trying to store some modality But what you're trying to do is you're trying to say let me try and compress the data that i'm storing instead of storing a bunch of strings There's only three options for the strings Store a bunch of integers that are very compact and then map those integers to those strings Because it's some enumerated type that's a pandas categorical and you can see how the numpy Or so you're at how the pandas array makes that possible Now if you think about the pandas series, what is the pandas series at on top of the array? It is an indirection on top of an array But let's ignore that for a moment and think about it as an indirection on top of a Numpy and d array let's skip the middle And if we think about that and we look deep into the Pandas series we'll find a numpy and d array. There is that indirection. It is that n d array plus something else And in fact if we were to construct a series from a numpy and d array We'd find that it turns out that it really is just a wrapping of that n d array to the point where if you mutated the Original n d array you'd end up mutating the series. This is by the way one of the reasons why sometimes things like Memory management is hard to assess in pandas. It's not always clear Do pandas make a copy of this data or not and even within The value constructors for series and data frames sometimes a copy is made sometimes copies not made And the api is not always that clear and the documentation definitely isn't clear about this What makes the series interesting well, it's not The same Thing that makes the array interesting. It's not that indirection What makes it interesting is the indexing if you think about a numpy and d rates a sequence It has some human ordering associated with it. You say this is the zeroth element the first element of the second element But what if you want to address those elements differently? What if you want to describe the position of those elements differently? what if you want to say Give this element a label 10 20 and 30 or give it a alphabetical label a b and c or instead of giving it the give instead of taking the natural number label zero one and two just Swap them around well what a panda series gives you is a lookup modality And the lookup modality is a distinction between being able to look something up by its integer position Is this row zero is this row one is this row two? and Some other lookup mechanism encoded in the index. Is this the element with label zero? Is this the element with label one? Is this the element with label two? The pandas index is very very interesting and there's a lot of details to it That give you a lot of meaning for how to use pandas correctly Pandas indices can have hierarchy. They can have implicit versus explicit hierarchy People who are scared of things like multi-index don't always realize that date time indices and pandas are also implicitly hierarchical and so There are certain look there's certain operations that you might perform that you might not be able to guess Does this give you a series or a data frame without knowing a lot about the index? The deeper you go the more you find that even things like the monotonicity of the index is very interesting in terms of what it will Turn to you When you perform an op when you perform a lookup operation Unfortunately, we don't have time to go into this But maybe if I have an opportunity to attend pike on india next year in person I'll tell you a little bit more about the pandas index Instead, let's let's wrap this up and talk about the pandas data frame. I told you this is a little bit gratuitous You have a data frame with three elements in it More likely you might have a data frame with two groups of elements two columns Oftentimes when we think about the data frame we think about it like as an excel spreadsheets A bunch of cells there's like rows there's columns We might think about it as tabular data, but there's something a little bit more to it And we can really formulate it in terms of the series We shouldn't necessarily think of the data frame as another layer of indirection on top of the series Because it's not guaranteed to be the case the data frame is composed of series objects You can take a data frame and extract series from it You can turn a series into a data frame, but it's not like one's necessarily built on top of the other That's not the conceptual structure here. Instead what you can think is A pandas series is a notion of having some data and having some alternate way to access that data by some kind of index label A data frame is the idea of having two dimensional data and having an alternate way to access it on a major axis and an alternate way to access it on a minor axis in other words The index and pandas is an index and the columns are also an index I call this a major and a minor axis because there's operations You can easily perform on the pandas data frame index that you can't perform on the columns For example, you look up columns by label, but it's actually a little tricky to look up columns by their position And in fact, that's not something that's generally Meaningful to a pandas. This is column zero column one column two. It's not something that people expect to be meaningful So there's not really an ordering of them There's just the labels there and so the monotonicity of the columns is not something that people and slicing the columns It's not something that people typically do although I've actually had cases where I wanted to represent Breakdown of data So I had columns or represented geographic lesions that I want to be able to do a multi index on them And it's actually quite powerful and it gives you the ability to really slice and dice data very nicely Now when you think about the pandas data frame instead of thinking about these silly data frames that just have two columns Let's think about one that has two named columns at an index And we'll try and see what the meaning behind all this is because I've told you a couple of facts I've told you a couple of details to try and give you a conceptualization of what these things are But here's the thing What really improves your life is being able to fiddle less with pandas is being able to get your analyses done quicker It's being able to represent these analyses More Tersely or more directly When you think of the pandas data frame as this almost geometric structure with a major axis and a minor axis And you think about all the transformations you typically want to do to a pandas data frame Well, you can kind of think about it geometrically. Do I want to collapse multiple rows into one row? Do I want to collapse multiple columns into one column? Do I want to take the columns and make them? The index do I want to take the index and make them the columns? And so for example here if we think of this visually We have a pandas array It's got columns a and b with values one two three four five six at indices x y and z index labels x y and z If I wanted to take that a and b up here and just pivot them to be part of the index I want to take the minor axis and pivot it to be part of the major axis That's just the stack operation If I want to do the opposite, that's just an unstack operation Now when you stack and you unstack you leave the existing major axis in place So you're appending onto that major axis And so you get a multi index if you want to throw away the major axis and you do a drop level And so here when I look at this I can immediately See in my mind because I understand what this thing is. I can immediately see in my mind. What happened You had a and b x y and z you took that a and b and you pulled it down here So you ended up with a b a b a b And then you threw away the top level. So you ended up with a b a b a b So you have a pandas series with the numbers one four two five three six and the indices a b a b a b And when you think about it this way you realize all that fiddling and all that searching through stack overflow to Manipulate my pandas data frames. It really is owing to a misunderstanding of what this thing means And an inability to grasp the fundamental structure of this thing And so I hope you've enjoyed this thinking exercise This is the thinking exercise that underlies almost all of the consulting corporate training all the talks I give I go as deep as I can on the way back out. I try and find meaning. I try and find applicable meaning Because ultimately It is not quite clear do the details matter I think the details do matter But I think the details matter in so far as they allow us to understand this meaning Meaning is about what as a human being can I derive from this that allows me to convey something to somebody else or make a decision better or allow me to quickly Destructure a bunch of details into or quickly structure a bunch of details into things that I really need to know and things That are just things I look up for example I understand fundamentally stack versus unstack pivot versus melt But all the keyword arguments that they take in pandas. I got to look that up every single time Does it take an in place? Does it not take an in place? I look that up every time But the meaning here allows me to say I know what this thing does structurally So I know exactly where to look and then yeah, I look up the details I look up, you know, is it a keyword argument? Is it a positional argument? What what are the names of the arguments? Does it have an in place or not in place? Does it make a copy or is it a view and so forth and so on because Ultimately when you're talking to people who are programmers But only because programming is what they need to do to do their job Their data scientists their optical engineers their network engineers their physical scientists The first question on their mind whenever they look at the details or look at technical sophistication is Why should I care? How does this improve my life? How does this give me something better? Why does going into those little gimmicks and these details? How is it not just a dalian something fun if I have some spare time? How is it something that actually is valuable for me? And the answer to how this improves my life is you go all the way down there and on the way back out You start to see the bigger picture you start to see the meaning behind these things you start to see How to structure this information? So it's not just a bunch of details, but there's a clear path for you being able to distinguish the underlying ideas from the details that's that corroborate or supplement that Because in the end What's really most important is what does it all really mean? I'm james paul. Thank you very much So Yeah, yeah, so that was really an amazing talk. So thank you so much for your great insights So here you have the first question. So how do you structure an advanced conference talk that's still suits all kinds of audience I I wasn't intending to do that. What I was intending to do here was to share with you the only real gimmick I have which is How do you Figure these things out. How do you derive some meaning some understanding of these things? How do you move beyond just a bunch of details because in reviewing a lot of Educational material around python. I found it's often very poor. It just overwhelms the person with Facts and there's no structure behind those facts. There's no meaning behind those facts It doesn't give them any greater guidance And so I thought what I would share with you is how you might be able to develop that yourself or some of the steps along the way And I think that what you often see is When you really have this deep-seated Intuitive understanding of the thing it's not actually very difficult. Um, one of the things that we do is we have a We have a corporate training curriculum and it's called fundamentals of programming And it actually goes into much more sophisticated things than the actual advanced python training that we do Because it turns out that when you talk about You know modalities or early binding or late binding immutability immutability laziness versus eagerness root-level code versus leaf-level code There's not all things that people can actually pretty much into it. They're pretty straightforward You can motivate them very easily and what really Traps people as all the millions of little details if you talk about python's object model in terms of protocols You can Within a couple of minutes figure out. Okay. This is how the thing is designed This is how the thing is supposed to be used and then you can spend the next two weeks of your time trying to memorize all the different underscore methods that exist So what I would say is try to answer this question as directly as possible I actually think that some of the really interesting advanced stuff Is surprisingly accessible to a novice audience if you can find that right meaning in that way instruction so so the question was um An example of code and production or an issue was caused Where the code was working out as explicitly instead of a kind of works Um, what you what you should do is you should see if you have any friends or data scientists and ask to look at their code And I say that in jest because the truth of the matter is All people who are data scientists or physical scientists or non programmers really motivated to write good code Maybe maybe not They're motivated in so far as if the code breaks in production, maybe that's something that affects them But they're not typically within their organization's value. They're not promoted based off of writing the best code You know, they don't publish more papers when they write good code and so as a consequence It's very difficult to really convey to these people. Okay. Why does it matter? And so hopefully part of this talk was yeah, you have all these little niche details all these little pieces But when you come back out the other side of it and you look at these things. Yeah, it kind of matters Right, if I choose a tuple versus a list that says something different And it means something different some things are easier some things are harder the distinction is actually very clear very stark and Maybe next time I make that choice. I can make that choice, right? and so what I would say is The amount of bad code out there That's bad Because the person didn't really put in the effort because the person just couldn't write good code It's not that high the amount of bad code that's out there because a person wasn't incentivized Or motivated to write better code is where the main problem is and that motivation that incentive And that meaning is surprisingly easy to convey and surprisingly easy to get somebody to look at It's hopefully for any of you out there who are trying to figure out somebody to sense yourself and code that you're writing Maybe you might be able to get it right the next time Great answers gems. Uh, we are over time. So I really want to thank you No, no, no, so thanks a lot