 Okay, thank you, thanks for the interaction, Peter. So, you already know about this, so let's get straight to the point. Today I'm going to talk about pandas, but we're already only talking about pandas indexing in particular. And the pandas index is a very powerful tool and I think it's very often skipped in beginner's tutorial to even mention the, oh, thank you, to mention the index and this is more like a closer look at the index. So, we're going to do a little catch-up on indexing, how we can access data with the index, index types, multi-index is in a closer look at the daytime index. So, the very beginning is just like a little repetition to get everybody on the same page. So, pandas is basically built on series. So, this is just like a simple example of a series. So, we just take some random integers and create a data pandas series and basically it's like a list or an array with numbers. But one thing you see here, we already have this and this is called the index and this is like a labeling. So, basically you can say it's not only like a data or a data list in Python, it's already labeled. So, it's a NumPy array actually here, which we can see at the data type. It's a NumPy array with labels, but I think most of you should know that. So, how can we access data in a series? And basically it's like a very Pythonic concept, we can just, it's called slicing, so we can just access it by the positional index, just like as we would do with a list. We can do slicing just as we would do in a list with Python. And we can also use the methods for this. Let's, one is the method is called iLock, but note it's not brackets, it's square brackets to slice stuff and we see that. So, this is just like a little warm up, but as I mentioned, we already have labels in our index even in a series in our series. So, here we can also just go and re-label it. So, what we're doing here is actually, we are setting the index, which is just like series.index as a method and we just take the alphabet like and re-label it. So, now we have exchanged our numerical zero index, series with like just like letters and we can still access by the position, which is like the Pythonic way in a list. But now we can even like slice it by labels. And we even, yeah, we'd like label D to F, so panels will just look here D and to F and give the result back. And now already the little for beginners from confusing part ends, but I guess you probably already rented. Can we slice by multiple? No, we can't, it's invalid syntax. Can we, and how could we solve it? Basically, there's also another method called concat. So, we can just like slice two series and re-concat it again and we have our new series. So, and one more thing of pandas indexes is usually you might probably think an index to be unique. And pandas indexes are not unique as we can see here. So, here we are re-labeling our series again and I just took the word get occur X, Y, Z here to re-label our series. So we have get occur X, Y, Z and can we use the log method to ask for what's with G? Yes, we can. Can we still slice it in a way we said, hey, let's give G to A and no, we can't because pandas is only able to use if we basically have here like something unique, subsequent series of unique values as we can see here. If you use the log method for X to Z, it works because X, Y, Z are unique in pandas. So, it's really nice and powerful but that's something you should be, must be really aware of when working with pandas series. So, now we know how to access data in a simple pandas index, so everybody's on the same page. So what about two-dimensional or three-dimensional data? So let's have a deeper look in the index structure. So, as we learned, the label of a series is usually called the index. It's automatically created, if not given by the data set you're importing or however you create your data or add your data to pandas. It can be reset or replaced, as I already demonstrated. It's fairly simple to replace and reset your index. There's also like a reset method which will do all the lifting for you so you don't have to give an explicit index. It may only contain hashable objects which is quite obvious that you cannot put a set or a dig there to basically label stuff. And it can have one or more dimensions even the index already. And beware, it's non-unique, yeah. Usually, I usually work with unique but you can also do some fancy stuff with non-unique indexes which we're not going to cover today. So, we have multiple index types. So we just have index which we just saw with basically just like labels of a series. We have a multi-index which I'm going to demonstrate later. We have a date-time index which is actually my favorite and we're going to talk about it a lot. We also have a time-delta index, an interval index and most recently in the lightest pandas version the categorical index has been added which can be also very useful. So, what's the structure behind all this? The basic all the ideas with data series and data frame is actually borrowed from the R language which is like the language of statisticians. So, the structure is, we have some data. In case, just like a reminder, of course, like the data is except for strings. It's NumPy under the hood. So, we have NumPy data types and it's actually so that the series are also typed. So, it's not like in Python. We have an array of multiple types. It's strictly typed. So, that's why we're also the performance coming from. And a series is called a NumPy array with labels. So, and what's the data frame? It's basically multiple series basically glued together by having the same index. So, note we have multiple series but we also like these labels there. And there's also a three-dimensional structure. It's called panel but I just want to tell you about it because to tell you, you can actually forbid about it because it has just been deprecated. Because basically you can achieve the same with multi indexes. So, it was removed for simplification. So, data frame. Basically it's like two-dimensional data which is fairly simple actually to imagine. So, let's create a new set here. So, it's just like a set of random integers. We see our index, automatically created indexes back again. The same applies for color names. So, from the names, so for each and every series it's also referred to as a column. And this is also referred to as a row. So, how can we access data in a data frame? So, I think this is now if we ask for a positional index we do not longer get, we don't get the row values. We get the column now because the data frame is first indexed by columns. So, we get the series out. Of course we can do the same for slicing but this is I think a logical break in the whole panel's API. It's very confusing because if you slice, we get rows. Which I think is a break but once you get used to it it's still handy and we can also use the iLog method. For example, to even slice just like part out of our data frame. So, this is like a zero axis and one x axis one. So, we can just like use the iLog method to slice out a segment out of our data frame. Which is really handy if we have bigger data frames. So, let's continue our adventure here and what if you want to slice two columns. So, I think it's very simple. We just use like, we just pass in like two dots to say okay, as in Python, take the whole array and can also just ask for the column. And sometimes it's a little bit confusing all this axis stuff and I really had a hard time to remember when I was new to Pandas and actually I stumbled across a nice, what we called Isis Brücke in German which is just like something to help you to remind stuff. So, axis zero is horizontal and axis one is vertical and it's fairly easy to remember because one, it looks just like a one. So, this is just like, this is how I basically remembered it because I'm also one of these guys like always like, oh, left, right, right, left, I'm sorry. And so, okay, here. So, let's go further. I really have trouble a little reading here. Let me reconfigure my set a bit, sorry. Okay. Let's relabel our index and the columns and it's fairly simple as demonstrated before. We can, here I'm just passing in a method, a function just like to rename our rows and columns just like by R starting with leading zero. So, we can, it's a little bit more memorable than just working with numbers. And of course, we can still now access the rows just like as before, if each pass ask for row C05, we get fifth column, well, basically the six more. The same of course applies for accessing the rows and the same for accessing the segments. So, this is just like the same logic as applied before just by positional values. And how can we now add data to pandas? Basically, because we sometimes have data, or often have data from multiple sources and how can actually pandas help us gluing together data from multiple sources and actually here the index becomes really handy because for example, if we add here, we're doing just like, I'm just adding a new series. It's called C10. And Peter, I lost my timer, sorry. So, what do we have here? I create a new series and you see the labeling is a little bit off and we just add it to our data frame which is already in place. So, we just say, okay, data frame, please add a new column called C10 and we pass in the data frame or the series we created here. And you see, we end up with like nan values here and we also miss labels because the index just does not match. So, this can be really handy for like joining multiple data frames, for example, because we can also be a little bit more like explicit about how we want to join the data because here we just do the same and we just say, how to join? This is like the same logic as it's from SQL databases. So, we just ask, we ask for inner join and then we only get the subset. Basically, we're both indexes match and basically the rest is just like dropped out of our data frame. And of course, if you apply something like that, PAN has always returned a copy of the data. So, basically, if you want to keep this structure, you have to store it in new variable or just like overwrite the variable you're working with was sometimes forgotten. What else can we do? Of course, we also can do an auto joins and here, I'm using another really handy method which is called in place true because in place just instruct, apply the changes to the data frame we're currently working on. And so, for an auto join, basically, we just say, hey, join everything. We receive everything and everything where we have no values, PAN does automatically adds nan values and there's another really handy thing. We can also just like instruct to say to ignore errors if we want to join and something throws an exception. So, this is like this nice example here. So, how do we get rid of data? So, we can use the drop method. Basically, I wanna get rid of this column and of course, we could just like slice a column but what if you just like want the third column, the fifth, the 10, the 20th column, you could be just like ask explicitly and basically join the data later but it's the drop method is much more handy. So, I just wanna get rid of the newly created column 10. So, we ask PAN does just like drop this but if we don't put ignore there, it will throw an error and because it might not be present. So, the ignoring errors can be really handy if you are not sure whether there's a column in your data that present. So, let's go to the multi-index. The multi-index is basically also like a fairly simple data structure I want to introduce you to or index structure. So, now we have a little different data set and so basically this is just like, we just have some, we just create, it's like we could imagine it's like hotel prices. So, we have a city, there's a price, there's a certain rating and it's the city is located in some country. So, this is just like a fairly easy data set and actually, so we have some major cities here and my hometown, Monheim as well, I was free to add that and now let's see what we can do with that. Well, we group because many people are not aware if you do a group buy in Pandas, you actually get back a multi-index and for example, so this is like the group buy and we ask for the mean and Pandas will just go as you probably know already take all the data types where you can actually make a mean off and so we see these are just like the rating and we already see like a hierarchical structure here. So, we ask to group buy a country, city and category and we pass this in as a list and Pandas will just create this data hierarchical data structure in the same order, we do the grouping buy, so we have the country, the city and then the category and our the mean values we were working for and this looks really nicely but it might be a little bit confusing. How can I access basically, for example, if I'm interested in getting the data from the cities? So, of course you could ask for these values and basically walk down the path of the hierarchical index but we can actually access it basically a little bit better. So, let's have a closer look on the index which actually it's really easy to look into the index and Pandas just by asking by dot index. So, what do we have here? We have VC, we have a multi index and it also indicates we have levels. So, we have like three lists which are all basically, thank you, kind of the new levels and we can also ask for the index level so it's really easy to look into the data by level and we also can ask just like for the names back again so it's Pandas is very explicit what's stored and we can also ask for the index values and here you see how the multi index actually works because actually it's just like tuples. It's just like tuples of the country, city and the category we are looking for and this is fairly simple and actually it's a very simple structure we can work in our minds to get to the data. So, and we also can directly access the data by just asking for the values by level. So, here we are just asking to give us back all the data all the values we have on level two and all the same applies for level one. So, I think this is fairly simple. This is just like two more examples because we can also just like use the LOX method to ask for all the data which is stored on the first level here, a country. So, here I'm just asking, hey, please pick me back all the data which are in the country for Germany and we even can just like access the hierarchical and just also like by passing in the list through and basically Pandas will just go and match the list we pass into the tuples it has stored in the multi index and so that's fairly simple basically to access the data you want. So, I really want to take spend some time talking about my favorite index which is the daytime index because there's a lot of data, basically almost all data has some timestamp on it and let me introduce you a data set for this little exercise. It's fairly simple data set. It's just like a timestamp and a temperature value which is taken from an open data set from the city of Aarhus and this is how the data looks like. This is when we just plot the data as it comes in and let's create our data frame, the daytime index which is fairly simple. We can just use the to date time method which is built in in Pandas and it's actually there to pass daytime values. It does most of the heavy lifting for you but you can also be very explicit how your daytime string is structured. So we just rely on this format here. This is like the default format and yeah, now we have an index and what are our discoveries once we have created the index, if you just like do the same plot again and we say, oh wow, this is like really going up and down really fairly random and now we see, okay, this looks more like a time stream of temperature values across multiple days and you see like that's one of the great things about Pandas that everything works really well together so we don't need to instruct anything in matplotlib here to how we want to present our data in which order like once we have a daytime index Pandas does the heavy lifting for us in matplotlib. So what else can we do? We can, yeah, this is just like a closer look on the daytime index so you see actually timestamps here. You can also notice, so we have a daytime, that's a timestamp, that's the name of the index. This is the value count of the length and we saw also the daytime index also supports frequencies which we're not going to cover today but it's also like fairly need to work with frequencies in Pandas. So let's group by the just like take the data from we had with we have in the index which is like timestamps and use the index for grouping the data and just like let's count and if you already see here, we are just not asking for the index as such which is just like one second granularity we are just at just like, oh, index date and it's already built in so we can easily group use the daytime index to group data by days without doing anything and we can also do something as well. There's also like the week and we can just like basically chain the methods here and say the mean and plot it, thank you. And we can also use the index to ask for what are like weekdays or and what is weekends? This is also a little logical break. My opinion in Pandas, for example, if you pass daytime objects, it's very friendly to US dates and as you of course know, the US is the only country with month, day, year which can be really troublesome and but for example, for here Pandas is zero index but zero is Monday but so in the US it should be like Sunday so this is like more European way to count weekdays. So what are we doing here? We are just like getting data from the index and then we just use the Boolean index just ask okay for which days five and six and so we get the weekends back and then we just like do everything together and just like ask for the hour of the data we have basically combined here and so we can actually find in our data set that the temperature at least, the weekends is higher which I think is a good sign if you live in Denmark so these are probably sunny weekends but it's a small data set, it has no significance. What else can we do? We can also just like ask for a date and we can just pass in like a whole month here as a string so this is just like a year and month and get the temperature plotted so it's a very, very powerful index so it really saves you a lot of time like making up your mind okay, what do I want to pull for lambda functions or like anything, anything once you have a daytime index basically daytime is at your fingertips. So what else? We can also ask for ranges just like slicing by dates which is also I think pretty neat and very useful and this is probably not as useful but just like to show you a little, we can also like just ask for the hour of the index and just ask for, we can basically make a like ask for this is like an end statement so we ask for all the data in our data frame where the hour was greater than like 12 o'clock and but it was just like until like 1600 hours so and then we just can plot it I don't know whether it's used for it but I think it's good and of course once you have the data with the daytime index, you can also do resampling which is like super cool so let's do resample a little bit so here's our real data set we just pass in the resample method and pass in D D is basically resampled by day and then we can aggregate data or like okay, what's the maximum and we immediately get back the maximum values back for each and every day but in a resampled fashion so and we can use, you will do the same and just ask by month which is M which is quite accessible you can also resample our data frame by day and ask for an aggregate which is also like really act functions also really handy because we can ask for minimum and the maximum and just like plot it so this is like the minimum maximum values each and every day and the last and most useful thing for resampling I want to show you is actually we can even resample by three days so basically we're very flexible on the intervals you can sample so if you want to have like three days one day, something, 12 hours, 11 hours, anything this is super flexible I thought it was a little bit hard to find actually, what is what so let me present you this slide with all how you can resample so I've taken the freedom the ones I found most useful to put them on the left side but Pandas actually was developed at the hedge fund where Wes McKinley was working at a hedge fund when he was starting to use Pandas so you have a lot of like business time frames there as well and so basically you can resample anything you probably can think of and that's the end of my little presentation thank you very much for your attention so thank you, Alexander I think we have one quick question otherwise, okay just a question, this time stamp is limited to numpy, daytime type is based on nanoseconds do you have any idea if you can use a different frame because I would like to have spent a bigger time I don't care about a nanosecond, second would be totally fine do you have any idea of this possible? no actually, actually I was really happy with the daytime solo now I have never stumbled across it but let me know if you find something okay, thanks again