 Hi, this is Brandon Rohr, and this is a minimalist guide to slicing and indexing Panda's data frames. There are a lot of ways to pull the elements, rows, and columns from a data frame. If you're feeling brave sometime, check out the seven-part series on Panda's indexing linked to below. Some indexing methods appear very similar but behave very differently. The goal of this post is to identify a single strategy for pulling data from a data frame that's straightforward to interpret and produces reliable results. And just a heads up, these are my own thoughts only. There's no guarantee that it's authoritative or even right. Now in case you wanted to skip to the end, here's the bottom line. One, use .loke for labels, two, use .iloke for positions, and three explicitly designate both rows and columns, even if it's with a colon. We'll step through some examples to illustrate these. Below is a link to the Python script if you'd like to run them yourself. To start with, we'll create a small data frame using data from Wikipedia on the highest mountains in the world. For each mountain, we have its name, height and meters, year when it was first summited, and the range to which it belongs. If this is your first exposure to a Panda's data frame, each mountain and its associated information is a row, and each piece of information, for instance name or height, is a column. Each column has a name associated with it, within Panda's, also known as a label. The labels for our columns are name, height and meters, summited, and mountain range. In Panda's data frames, each row also has a name. Now by default, this label is just the row number, counting starting at zero. However, you can set one of your columns to be the index of your data frame, which means that its values will be used as the row labels. We'll set our column name as our index. It's a common operation to pick out one of the data frames columns to work on. To select a column by its label, we use the dot-loc function. One thing that we can do that makes our commands easy to interpret is to always include both the row index and the column index that we're interested in. In this case, we're interested in all of the rows, so to show this, we use a colon. Then to indicate the column that we're interested in, we add its label. The command mountains.loc colon comma summited gets us just the summited column. It's worth noting that this command returns a series, the Panda's data structure that's used to represent a column. If instead of a series, we just wanted an array of the numbers that are in the summited column, we can add dot values to the end of this command. That would return a NumPy array containing 1953, 1954, 1955, and 1956. If we would only like to get a single row, then we can use the dot-loc function again, this time specifying a row label and putting a colon in the column position. If we only want a single value, for instance the year that K2 was summited, then we can specify the labels for both the row and the column. The row always comes first. While it's true that you can get away with using only one argument in the loc function, it's most straightforward to interpret if you always specify both the row and column, even if it's with a colon. We don't have to limit ourselves to a single row or a single column using this method. Here in the row position, we pass a list of labels. This returns a set of rows rather than just one. We can also get a subset of the columns by specifying the start and end column and putting a colon in between. In this case, height colon summited will give us all of the columns between and including the start point, height, and the end point, summited. Note that this is different than numerical indexing in NumPy, where the end point is omitted by default. Also because we've already specified the name column as the index, our result will also be returned, the name will also be returned in the data frame that we get back. In addition, we can select rows or columns where a value meets a certain condition. In this case, we want to find the rows where the values of the summited column are greater than 1954. In the rows position, we can put any boolean expression that has the same number of values as we have rows. We can do this for the columns as well if we wished. As an alternative to selecting rows and columns by their labels, we can alternatively select them by their row and or column number. The ordering of the columns and thus their positions depends on how the data frame is initialized. The index column, our name column, doesn't get counted in this case. To select data by its position, we use the iLoc function. Again, the first argument is for the rows and the second argument is for the columns. To select all the columns in the zeroth row, for instance, we write iLoc 0 comma colon. Similarly, we can select a column by position by putting the column number we want in the column argument of the iLoc function. We can pull out a single value by specifying both the position of the row and the column. We can pass a list of positions if we want to cherry pick certain rows and or certain columns. We can also use the colon range operator to get a contiguous set of rows or columns by position. Note that unlike the loc function using labels, the iLoc function using positions does not include the endpoint. In this case, it returns only columns 0 and 1 and does not return column 2. All of this can be summed up as follows. 1. Use loc for label-based indexing. 2. Use iLoc for position-based indexing. And 3. Explicitly designate both the rows and the columns, even if it's with a colon. This set of guidelines will give you a consistent and straightforwardly interpretable way to pull data that you need from a pandas data frame. Good luck with your data munching.