 I'm going to start. My name is Kojo Idrisa and here at the top of my slides I've got my name and my Twitter handle so if you have questions about this because I know everyone has deep burning questions about spreadsheets. It's what you spend your time thinking about as do we all. So this is my talk, the Python spreadsheet, State of the Union, July 2017. The reason it's called State of the Union, July 2017, is because I originally gave this talk in 2013, or a version of this talk in 2013. And so as it says here on the slide, it used to be, I'm currently a QA specialist for, I can't highlight that. I'm currently a QA specialist for Decisional Health, which is a startup in Houston, Texas. And that's actually relevant in a second. I'll show you something why. So that's what I do now, but about 18 months ago, I just became a QA specialist about 18 months ago. So as seems to be the theme for this room, I'm some of those career changing people. And so before I was a software developer, I was an accountant and also a university instructor in a couple of different countries. So that's me. And the link there is to the GitHub repo where the slides and the iPython notebook for this presentation are. So you can take a look at that at your leisure. But the fact that I used to be an accountant is also relevant to this talk because this is one of the reasons I spent time dealing with spreadsheets. So here's a brief outline of the talk. We're going to talk about sort of how I got here to the point of wanting to stand in front of people and talk about spreadsheets, which is, you know, its own thing. Then also, we're going to demo some of the fundamental OpenPyExcel data types. OpenPyExcel is a Python library that I'll be demonstrating. And as the name suggests, it's used for working with spreadsheets, sometimes called Excel files, and specifically it works with XLSX, XLS or XLSX files. So you can't use it with the ODS files that LibreOffice generates. Then after some introduction of the fundamental data types, we'll look at some basic things you can do with spreadsheets and Python. And then we're going to look at some problems that you will run into if you're trying to work with spreadsheets programmatically. So it's not all peaches and creams, but you can learn from my pain and try to make things go more smoothly for yourself in the future. So my secret origin story. So I used to be a professional spreadsheet fighter, which is also known as accounting. You would be terrified by how many organizations around the world are literally run on spreadsheets. Just please try not to think about it. So I've had an interesting coding for a long time. You hear about people who have been coding since they were kids. I first got exposed to coding as a kid, but I was never a CS major, so I didn't need to code. I just had an interest in it, but I was sort of dabbling in stuff because I didn't need to do it. So in late 2012, I decided to get serious about becoming a better programmer and eventually trying to become a professional developer. And so I went pro in December of 2015, working for a decision of the company where I am now. There have been at least a couple of talks in this room about changing careers, and so I won't go into that in a lot of detail, but some of my experiences mirror those. Some of them are different. If you really want those details, you can talk to me at some point in the future. But on to the spreadsheets. So my role in the Python community, and one of the reasons I'm doing this talk, is I wanted to try to be able to make a contribution to the Python community. And again, as someone who's sort of coming into this later in life changing careers, I don't have a CS degree. I knew I wasn't going to be writing the wicked codes, as they say. But I thought, well, what can I do to make a contribution? How can I help? My approach is to try to help grow the Python community, and my feeling is one of the best growth vectors for the community is non-programmers. Not trying to convert Java programmers into Python programmers, but to try to find people who aren't programming, expose them to the benefits they can get from programming, and then show them how Python is perhaps a better tool to get those benefits. Also looking at solutions for people who aren't professional developers, I remember when, in the demonstration that I'll show you, I recall the first time I showed it to people. I presented this at my Python meetup, my local Python meetup in 2013. And one of their suggestions was, well, just put it in a database. Well, if you're someone who knew how to put something in a database and do SQL queries, like, you wouldn't have this problem. Often spreadsheets, and for those of you who work almost anywhere, most business applications is very broad, I know. The reality is most business applications are really just relational database systems with a GUI on front of them. There was a point-and-click interface on the front. And so you get a report from some system. When someone gives you a spreadsheet full of data, it's very likely that that just is a dump from a database. So put it in a database doesn't really solve that issue. So I'm trying to make a contribution. Who's the talk for the talk? We're looking at either spreadsheet users who want to try to, you know, step their game up to the next level. Spreadsheet users are often very proud of their ability to use pivot tables and do things like that. But, you know, there are, that's a whole different thing. There are other things that you can do because most spreadsheet programs sort of give you limits. So that, or if you're a Python developer who keeps finding yourself confronted with spreadsheets, people keep giving you spreadsheets and you have to do something with them. So that's who this talk is for. Not a lot of code here. The goal of this talk is to be very beginner-friendly. Again, my hope is to help grow the community either by, you know, by bringing in new people who maybe weren't programmers and then found, hey, you know what, I can write some code that will actually help me. At the same time, you all can make new friends in your places of work by maybe writing a little bit of Python to help some of your coworkers automate things and make things faster. And then, ta-da, you're the most popular person in your office. All the, you know, whenever there's like food comes to the office, it shows up on your desk first. And then, the rest of those terrible people who didn't help anybody, then they can pick through what's left. But food's been brought to you first. So this is, this is why I'm here to help you. Let me, see, we're gonna demo time, this demonstration sponsored by Jupyter Notebooks. That's what I'll be using, using a Jupyter Notebook. Let me first, to give you a quick look at, let's see, I told you I work for this company, Decisio Health. And so, this is my Jupyter Notebook. This is what my company makes. We make medical dashboards that get used in ICUs. And each of these sort of numbered things represent the patient. So if we, like, take a look at, say, this patient, they've got things going on. We see things like their heart rate, their blood pressure, see the heart beating. You notice the heart rate has a color yellow, the blood pressure has a color green. Here in this area, you see the respiratory rate. The colors that those numbers are, are determined by a JSON, a JSON configuration file. The data from that configuration file come from a spreadsheet. It's initially put into a spreadsheet. Doctors and clinicians sort of sit and decide, okay, what should these ranges be? Those numbers are put into a spreadsheet. Then, called that I've written, turns that into a JSON file that configures this. Is that the optimal way to do this? No, but it's the way that it's currently being done. We're moving towards other solutions, but this is one example of the nature of complexity that you can achieve. And we'll see an example of turning some spreadsheet data into JSON in just a little bit. This is actually, and since we, I'm here at Pi Ohio, one of the places where this is in use is at the University of Cincinnati Hospital. They use our product decision and so they use it in some of their ICUs, Intensive Care Units. And that's what this software does. So, just sort of a, sort of a, a live demonstration of what can be done. So, there you go. So, demo time. And we're going to start with, actually. So, instead of starting with the code, we actually want to start with the spreadsheet itself. So, this is the actual spreadsheet. Make that a little bit bigger. Wrong way. Okay. So, what we've got here is some simulated timesheet data. And so, this is similar to a task I did in 2012-2013. So, you've got simulated timesheet data. You've got an identifier for the employee, the cost under that employee works in, the division of the company that they work in, which is us, which is cost centers or subdivisions of the division. Their manager, and obviously there's like a nonsense name there. The date that they worked, the employee's name also kind of a nonsense thing anonymized. The number of hours they worked and whether or not they worked on our project, you'll find in professional services organizations like accounting organizations, engineering organizations, there's often a focus on are you working on a billable project or are you not working on a billable project. And so, that's what this data is designed to reflect. In this example, everyone's working on a billable project. And so, we're not doing that level of analysis. Here's more of an aggregation. I made a simplified example just for the purposes of our demonstration here. But that speaks to one of the things you have to look at when you're dealing with working with spreadsheets programmatically. You need to understand the actual data that's there. What underlying data is in your spreadsheet? You can't just write code that makes things happen unless you actually understand the data. So, this is the data we'll be working with. I've got a thousand rows of data here. And so, you can see, for instance, for this person, you can see that they've worked, it's the same employee ID number with the same department number, the same cost center number. These are the different hours they've worked, they've built on this project. So, this is, let's see, so on this day they work four hours, on this next day they work four hours, so on and so forth. So, that's a summary of the data we're working with just to provide some context of the analysis that we'll be doing. So, first things first, we take a look at reading in a file and then some basic data types. And I'm going over the basic data types because OpenPyXL provides some different data types beyond the standard Python types. So, first things first. And again, the talk is geared to be as beginner friendly as possible, so if, you know, I already know how to import things called Joe. Well, fine, maybe you do, but perhaps you have a friend who might not be so familiar, so we're just going to go through some of that stuff. So, Import OpenPyXL, that's the name of the library, then you pull in this workbook module and that lets you work with workbooks. And then once you've done that, you have this load workbook function and this is the name of the file that I just showed you. It's a demo file. Here I've got data equals, data only equals true. What this does is if you have a formula in your spreadsheet, this would just give you the result of the formula, the last time it was calculated, as opposed to bringing in the formula. And that's an important setting. Now here I want to talk a little bit about workbooks versus worksheets versus spreadsheet versus tab. And I point these out because people tend to use these terms interchangeably and OpenPyXL tends to use them very specifically. So, people just call the spreadsheet a spreadsheet and then inside the spreadsheet you have tabs, little tabs and spreadsheets. An actual spreadsheet file like an XLS file is called a workbook, so a collection, a complete spreadsheet file is called a workbook. Within that workbook you have work, individual worksheets. So here, and this might not be unfortunately, this might not be visible to people in the back, but you've seen a spreadsheet before at the bottom that you've seen the tabs along the bottom of a spreadsheet, each of those is an individual worksheet. So here when we import workbook and then load workbook we're getting the entire workbook file and then we move on to working with worksheets. So here we've got workbooks and if I print workbook, so I was a terrible idea to do live coding but I'm doing it anyway because I am unwise. You'll notice here when I print, do print workbook it gives me this object, this OpenPyXL workbook object. Again this is one of the data types that OpenPyXL provides and if we take a look at the dir command and if you are a new, how many of you by sure of hands we can sit to yourselves, either new to Python or new programmers? Okay, so for the people who are newer, new to Python or new to programming, this dir function is very helpful. What this will do is for almost anything in Python, type in dir, the dir function and pass it, it will show you what are called the attributes for that object, so these are the things that you can do with it. So here this WB, when I loaded this workbook in, it's my workbook object so what can I do with that workbook object? And I'm going to ignore these Dunder methods and these single underscore methods. I'm going to focus on the named ones here so you can see the workbook is active, you can take a look for instance, you can create a sheet in it. What we're interested in is the fact that you can get a sheet by name or you can get the names of the sheets in that workbook. So again, that's why I made that distinction between workbook versus worksheet. Pretty good which is another good question. I know I have run into some spots where the help doesn't work. Let me get to that in just a second. So we've got some methods here that you can use with the workbook. Let's see, what else am I looking for? I think that's going to be it for now. Oh, also sheet names is the one where I should use. So you can get the names of the sheets that are in that workbook. And to get to that question, let's take a look at the help function also almost as useful as the Durf, well not also. Just as useful as a Durf function, it just does things slightly differently. Most Python functions will have what's known as a Dockstring that will include some information about how to use the function. So if you do help for that WB, you get help for the workbook and it gives you some idea of how to use that object. So you can do this for Python built-ins as well. Ah, let's see, I'm in your browser. I'm like I need to open a web browser. No I don't. The open, so read the docs. And if you're not familiar with read the docs, you probably are, you just don't know you are. A lot of open-source documentation is on this site, read the docs. There are a number of wonderful conferences about it, but so openpyxl.readthedocs.io, this is where the documentation for open pyxl is. There's a tremendous amount of documentation for it. There is a lot that the dictionary can do. I'm just sort of covering some of the basic stuff because there's literally not enough time to cover all of it. But that's where you can find some more details. No problem. All right, so we've got, so we've got this workbook. And now we want to know, okay, well what sheets are in the workbook. And so we see here we've got the simple read, simple write, visual read, Japan spending. I was in Japan like two weeks ago. And so I was I was working on this and also like trying to figure out where all my money went. And so this is a good thing for spreadsheets. Clean data is the one that we're most interested in. Well, you know what? It's the one we're second most interested in. The one that the one that everyone is most interested in is kpop draft roster. Unfortunately, that we won't be using that one today. Hopefully I'll have that something ready to go with that. I'm giving this Hockey DjangoCon. Hopefully I'll have something ready to go by then. But until then you can find kpop on YouTube to tide you over. One thing I will point out is if we take a look at the actual spreadsheet itself, for people in the front row you'll notice that the... I don't know what's happening. This is why you shouldn't have your friends come to your talks. People in the front row you'll notice that the Japan spending tab isn't visible. That tab just like if you've worked with spreadsheets you know that you can sometimes hide a sheet. And so that sheet is hidden here, but open pi excel when it gives you the sheet names it still finds it there even though it's visually hidden. So just something worth pointing out is like you can't hide a sheet and then think no one will find it. Yes, they will find it. That's not how espionage works. I have not worked with anything that was very hidden. I've seen secured sheets, but then I don't know. So that is the situation where I think you have to dig into the docs a little more. I wouldn't be surprised, but I haven't gone that deep into that particular rabbit hole. All right, so let's see. So we've done that and so now we want to build this demo worksheet. And so this is the actual sheet we'll be using for the demonstration. So for that I'm doing that WB, that get sheet by name method that we saw earlier and I'm using the name that I got from the show names function before. So I want the clean data sheet and that's going to be my demo worksheet and that is this sheet here. This is the clean data sheet. That's just what I named it. So give it that string and then we've got the same help and dur function here for the demo worksheet and we'll take a look at some of that. Again, there are many, many attributes. Oh, was it just somebody who's very hidden? Oh, look at that. All right, so like I said, I mean there are, and I'll get into this in just a second, but there are some maybe additional complications that you run into when using fmpi excel, but they're there for a reason because of things like the fact that a sheet could be hidden or it could be very hidden or there are all these other attributes that come. It's not just the data that's inside. So here in the sheets we are going to see the active cell. You can get a particular cell, I'll dig into cell a little bit later. The columns in a sheet, so there are a bunch of different attributes here. The rows, we're going to be focusing on the columns and the rows and the cells, but you can also give them, you can change the title, you can read the title, and then values. The values attribute is where you're going to get the actual data that you're after. So, here we go, so there's our worksheet, and so now we look in the rows. Each worksheet is going to have rows. You've seen spreadsheets that have rows that go this way, columns that go that way, and so here, when we look at the demo worksheet with rows, it returns a generator object, and so it doesn't just give you back a bunch of rows, it gives you a generator object. If you are new to Python or programming, a generator object is similar to, say, a list, or tuple, which is like a collection of things, but let's say you have a collection of 1,000 things. If you've got a list of a thousand things that takes up a lot of memory, a generator object says, okay, I've got a thousand things here, but I'm going to give you them, give them to you one by one. It's a simplification, but that's effectively how they work, and so what OpenPyExcel does is it doesn't give you back that whole list of rows, it gives you back a generator object that will give you one row at a time for memory savings, but that's important because you would work with a generator object differently than you would work with a whole list or a whole tuple. So, your rows is a generator object, and I've got a little bit of a difference here, so, so what is in that, what is that generator object giving us? The generator object is generating tuples, and so a tuple, if you're, again, if you're sort of new to Python, it's a collection. The difference between a tuple and a list, a tuple is a collection of objects where the order has meaning. So the first object in a tuple has some sort of meaning, relative to the second one. In this case, you'll notice that the first thing in the tuple you've got clean data, a1, b1, c1, through h1. It's the first row, and if we take a look at our actual spreadsheet, you'll notice it goes from a to h, and so it's the first row, then the second row, so on and so forth. So each tuple represents a row. So a generator object that gives you back tuples, but not tuples of the actual data values you'll notice, but it's of those cells. So cell, clean data.a1, and so the cell itself is a different data type, and again I point those out because you work with those a little differently. So what's going on with the cell? So when you do this, here I've got, so for cell in this next demo worksheet dot row, so you recall worksheet dot rows gives you that generator object. As I said, a generator gives you one item at a time, so to just have one item pop off that generator object, you use this next function. And so what this is doing is it's giving me, and let me have it here for the purposes of the video, what this is giving me is it's giving me the first row from that generator object, and then and I'm calling it cell, and so I'm saying give me cell zero and one where I'm assigning the cell column in the cell row, and then I'm printing the cell itself, the type of the cell, and then the cell's value. So the cell dot value is actually what's inside the cell. That's what you actually see when you look at the spreadsheet. And we'll do that. And so I've got it formatted, try to make a little senile. So I've got here, so this is the cell, cell A1, and then it's printing the actual cell data itself, so it's a type cell, clean data dot A1. The class is an open PyExcel cell, so it's that type of object. And then here we see the actual value, so employee num, call center, division, so if we look back at the spreadsheet, cell C1 division, you see here that cell C1 is division, just the headers, so, but again I point that out because when you are pulling these things out of open PyExcel, you're getting this cell to get the actual data value you need to call that value attribute. And why does this happen? Let's see. You also get different data types inside, so here in this demo worksheet, I'm looking at cells E1 and E2, you can also reference cells by the specific cell number, by the specific cell identifier. So cell E1 and E2, here, column E, E1, E2, here's the text string that worked, here it's a date value, so that's E1 and E2. If we look here, we see that E1 is that string that worked, it's class string, and then E2 is a date, and it's a date time object, and so Python is able to handle date time objects and that's what you're getting it, so it's giving you a date time object. This next cell is basically just the same thing. Well, actually, yeah, this next cell is the same thing, but it's showing you just the values. And so why is there this cell object? There's a cell object here because when you're looking, you do think the spreadsheet, there's data in the spreadsheet, but there's a lot of other information around a cell in a spreadsheet, and so we'll just take a look at some of that. So we've got all these other attributes, is, you know, does the cell have a comment on it? Is there a border on the cell? What column is the cell in? Again, a comment. Is there some sort of a fill? Is there some sort of background color on the cell? What font is on the cell? And so, you know, what row is the cell in? Here, finally, what's the value on the cell? That's usually, that's usually what people are after, is the value in a cell. What styling is on the cell? So this is why you get these cell objects back. But I point that out because if you go to, say, if you tell that to give you cell before, you're not going to get a value, you're going to get a cell. You then have to ask specifically for the value, but you can also ask for other things. I'm going to take a quick look at cell styles. And so here, we're looking at cell E2. We want the font name, the font color, the background color, and the border. And so, the font is aerial. And then here, more details about the cell color, more details about the border. And again, I mean, there's a lot of detail here, but you can read all of these values and you can, you can write them as well. And so, if you want to look for things that are only in a red cell or you want a certain result to be returned in a cell that has a blue background or has a certain border, those are all things that are available. Make these precious beautiful has been left as an exercise for the viewer. I don't have time to demo all that, but again, those are things you can do on your own. So, now moving on to the more sort of, quote-unquote, hardcore portion of the demo. Doing this aggregate time sheet information. So, we've read in this sheet. We saw in the earlier code, I read the workbook in, and then I read in this sheet as demo sheet for clean data. And so, now what I want to do, if I want to aggregate this information, if we look at it, we see we've got, for this employee, for instance, we have multiple rows, each representing a different amount of time they worked on a different day. We want to aggregate that. So, what we want to end up with is how much time did this employee work in this month. And so, we've got data here for a fictional June 2017. So, first we're going to start out with creating a list of employees. I've done this here in a for loop, but then I've also done it in a set comprehension. I think because of time, I'm not going to go into the distinction between a for loop and a set comprehension. If you have questions about that, I'm more than happy to answer those. But I want to make sure that I sort of demonstrate as much as I can here. So, what this is going to do, I'm creating a set comprehension, and it's basically going to say, I want the row zero value. For each row in the demo worksheet dot rows that generate our object, that gives you one row at a time. I want the rows, the, I want the zero index or the first thing in the row for each of those rows, if that row is not equal to employee number. And that's just a simple thing I did to get around the fact that it's going to grab every row, including the ones with the headers. So, I'm just blocking that out. And I'm using a set comprehension here. So, again, if you're new to programming a set in Python, is it something, is another collection, so similar to a list or similar to a tuple, the difference being a set will only have unique, a set won't have any repetition. So, each value in a set is unique. And so, if you have, you know, if you have five people in a room, two of them named Mark and three of them named Steve, a set will only have two elements, one Mark, one Steve. So, it eliminates duplication. And that's what I'm doing here because I only want, I only want individual employee IDs. So, that's what I'm doing here. And then, so what does that set look like? Looks like this. So, I end up with this, this set of employee IDs. And I know from having looked at it before and having done some filtering in the spreadsheet that I should have 49 employee IDs here. Because that's how many employees I've got. So, now I have this, this set of keys that I can use for employee. And now, how do I aggregate this data? Here, I've decided to, just to build a dictionary. And for reasons that we'll see a little bit later. So, and again, so at this point, we're sort of mixing some Python with OpenPyXcel stuff. So, here, this is specific to OpenPyXcel, the fact that each of this row zero is a cell. And so, I need to get the value out of that cell. So, row zero dot value, that's what's giving me the value from that cell. That's an OpenPyXcel thing. The rest is fairly standard Python. If we look at this aggregation data, again, we've got a lot of standard Python here. For each employee in that employee ID's construct that I made before, I'm going to assign that employee's ID as a key in an empty, in an empty, as a key in a dictionary. I've created this empty dictionary employee aggregate, where I'm aggregating the time sheet data. And then for each employee, I'm going to set that employee as a key to another empty dictionary. And then I'm here, I'm using a list comprehension to make a list of all the hours for that employee. And again, row six dot value, row six, if we take a look at, again, from that demo's, demo worksheet dot rows construct, row six, one, two, three, four, five, six, not six, but again, Python hence, and this gets tricky. We'll see this in a second. So for those of you who are more experienced, Python indexes from zero. And so the first element is called zero. But with the spreadsheet, it starts counting from one. And we'll see how that comes into play in a second. So here we have zero, one, two, three, four, five, I can't, I can't keyboard, zero, one, two, three, four, five, six. So I want to, I want index six, the hours worked. And I'm making a list of those here, because I don't want unique values, I want all those values, then I'll sum them later. But for cost center, division and manager, each employee should only have one. One cost center, one division, one manager. And so I'm making a set comprehension here. And so that will give me back the one value for those. I do QA, and I also used to be an auditor as an accountant, so I'm one who likes to double check things. And so I know that, that my cost center and my division and my manager should only have one value, so I'm just doing a search in here to make sure that that's the case in my calculations. And then I'm assigning those values into that employee dictionary that I made before. And so hours here, I'm taking that hours list and I'm summing the hours in a list to get a total. And then here I'm just assigning those other values. I'm making them, I'm converting them to a list here because before they were a set. And so I'm making them a list here. And I'm actually, I'm making them a list and taking the zeroth element, which is the first element, but since it's a list of one, it's just taking that value out. Which in this case is actually going to be a string. And so what does that look like? Now once I've aggregated that data, using the p-print function here just to make it look nice. And you end up with this. So you end up with a dictionary like this where you've got employee ID number and then their cost center, their division, and the number of hours they're working in a month. And so you've got that for each employee. And again, the length of that is 49. So I know I've got 49 employees. So I know I've got the right number of people here. So we've taken it out of spreadsheet. We've aggregated it. And so what do you do with it now? So we've seen reading from a spreadsheet. One of the things you want to do with OpenPAI Excel, read it from a spreadsheet and then make use of it in your Python program. So we've done that, pulled it into a program, dumped things with it. At some point you might have a program that you've run. It's created some results. You want to write those to a spreadsheet. How do you do that? First, you want to create a spreadsheet. So here I've just created a new workbook object, our output book, which is again another OpenPAI Excel workbook object like we saw earlier. And then in that workbook it will by default create a sheet like sheet one like when you open a new spreadsheet. But I wanted to get, I wanted a sheet with a specific name that I could work with. So here I've said this variable output sheet and use this create sheet method that we, you might remember from when we looked at the workbook. So you can create a sheet. Here I've given it a name, aggregate time. And then this zero argument means it's going to be in the zero position. So it's going to be the first sheet in that workbook. So a new workbook and a new sheet in that workbook that's going to be at the front. This is all in memory. This doesn't actually exist as a file yet. And so when we look at the output book here we notice that it is again it's a workbook object which is the OpenPAI Excel data type. And so we have our output spreadsheet target, our output workbook target. And so now we want this to sort of look a certain way. And so we want there to be a header like there was in the original sheet so that when you give this to someone they can see, oh, what are these values? So building a header here. So I'm just making a list that I'm calling header. And this is another way that you can access values in the demo worksheet. Before I sort of went through by row and picked those values out, you can also just go to specific cells. So here I'm just referencing the specific cells. So go to cell A1, get that value, go to cell B1, get that value. And those are going to be the headers from the original spreadsheet, from that demo worksheet. So I'm just making a header that way. Also just to show that you can go and just grab a specific value from a specific cell if you want to. And then this is what our header looks like. And so here's what our header list looks like. The list of strings. Nothing too exciting. Then I'm going to build this output data construct. And the first thing I'm going to do is I'm going to append my header to it. So this output data structure, it's a list. I'm going to append my header list to it. And then I'm going to, and again, so here we've got fairly basic Python. For those who are newer developers, as you may recall this employee aggregate that we saw earlier is a dictionary. If I use this iter function, it will iterate over the keys in a dictionary. And I'm going to just go through each key in the dictionary, each key representing a person. Go through each of those people. I make this empty new row and then I'm just appending to that list. The new row, which is a list, I'm appending to that list the values that are inside the dictionary for each employee. So I'm going to basically create a new list inside that outer list for each employee. So each new row is going to be a list that I'm adding in. So going iterating through the dictionary, each key, each person is going to become a new list inside. So I've got a nested list, a list of lists. So I'm doing that and then after I've completed that new row, that last new row, I'm appending that new row to the output data structure. So first I appended the header to the output data structure and then I'm going through that dictionary and appending each row, which is a person, to that output data. So when I'm done with that, I have my output data structure and then here I'm assigning those values to my worksheet construct, this in memory. And now this is where we begin into that difference about indexing from one versus zero. So I'm saying here for row and output data, so for every row and output data, which is that nested list, I need to grab the row index, which is output data, the list, that index for row. And then I'm also saying, okay, for each column in a certain range, so columns in a certain range, which is the length of output data zero, output data zero being the first list, which is the header. So in the length of the header, that's going to be how many columns I want. I'm going to say output sheet, which is this worksheet we have in memory, dot cell. And so I'm trying to address a specific cell, but which cell? Here, again, and so this is where you get it into again, open pi, excel, nomenclature. Row equals, and then you can give it a value for row. In this case, I want row index plus one, because Python will give me this row index, it'll start from zero. And the same here with columns, each for columns we're going to have this range value, that will start at zero as well. And so I'm having to add the plus one to make that adjustment between Python and spreadsheets. And so for this row, first row, first column, dot value, and so I'm saying for my worksheet, this in memory, the value I want for that particular cell, set it equal to the same row and column index in output data. So I'm just basically going to writing, that's the actual sound it makes too. So don't be alarmed if your computer starts to make those sounds. And so that's basically just going through and just writing that out. And let me make sure I run that code. And then after you've done all that, it's important that you actually save your file and give it a name. So output book dot save, output book, again, being our construct, we're saving it. And I give it this file name done piohio, because I'm at piohio. And there we go. And so that's not terribly impressive. Like, okay. It's not terribly impressive. But when we take a look at our file listing, and I've got this sorted by time, you notice that here at 1640, we've got this file piohio, there we go. You have this file done piohio.xls. And so that file was created just now. Let's take a look at that and see what that looks like. Just do this so it's more visible. Open. There we are. And voila. So this is, this has taken, this has taken data from this spreadsheet, these thousand rows, and compressed it into these four, these, well, 50 rows, one header row of 49 rows of data. So I have one employee, and the hours that that employee worked for the month, and their cost center, the cost center for that employee, their division, their manager, and the number of hours that they worked in a month. And you can double check a thing like this. Let's see, people in the front will be able to see this, but you want, at the bottom of the spreadsheet you can see 53, 86, there we go. 53, 86 is the total number of hours in our new construct, and in our old construct, we have the same number of 53, 86. So, if you'd applaud or not, I don't know. There's a question as to how much applause spreadsheets actually need, but what's it there? Exactly. So that is what it looks like to redate an infamous spreadsheet, process it, do something with it, or take the results of some sort of Python process and then send them back to a spreadsheet. And so, and again, with the style options, there are things you can do if you need to, I've done things like with this, where I've created reports that have borders and nice things around them and colors, so you can do all those things as you see fit, it just requires a little more manipulation. The other thing you can do is you can turn this into some sort of a JSON configuration file. So this is sort of similar, it's a very simplified version of what we do, what I showed you, what we do at work that we use to configure our ICU dashboard software. So with that same work that I've done before, I have that employee aggregate dictionary that was created, import the JSON module, and then just give it a file name to write to, and then here I use the JSON dummy, and again, this is all just standard Python, there's nothing open PyXL specific about this, but if you know that you want to create something in JSON, then you just need to structure your construct so that it can be done that way. So I'm calling it, I'm using this, I'm telling it to dump the employee aggregate construct, and I want sort keys equal true because I want this sorted in the order of the employee numbers, because that just makes it a little easier to read, and then indent equals four makes it a little nicer to look at, then we run that, and again, doesn't seem to be anything terribly exciting, but if we look at our output, and so if I run this again, you'll notice that the newest file is now this aggregate time JSON file, so let's take a look at that, and that is actually in my editor, and so this is the JSON file where I've got, again, each for each employee, I've got their detailed information, what cost center, what division, what hours, and who their manager is, and I can then use this JSON file to do something else, to configure something else, to determine how something else operates, that sort of thing. Again, there's a huge amount of things that can be done with this, you can take a look at the documentation to see those things in greater detail, let me go back here, and cover a couple more caveats, so reading data in from a spreadsheet, processing it, writing it back out to a spreadsheet, not unusually complicated, a few little things that I got for it, but nothing terribly complicated, you can sort of, you know, give it a little better without practice. The problems that you'll actually run into are the facts that spreadsheets tend to be often used as the visual medium, so people will create a spreadsheet that looks a certain way, because their boss likes to look at it this way, or it looks better when it's this way, I can read it better this way, and so often you will end up with a spreadsheet that is not easily processed, if all the spreadsheets that you ever get in life look like this, you know, or this, if they're just like the sort of database table top, database table dumps, you'll be fine, the truth is someone will make a spreadsheet that they think looks awesome, and then they'll want you to do something with it programmatically, and that's where the problems come in, so you'll have to learn how to deal with those things, again you can do, you can get around those things by addressing specific cells, or working with colors or things of that nature, but that's the issue, you have this unstructured input data, or data that only makes sense if it's being parsed by human eyes and a human brain, it's difficult to work it programmatically, so that's where, you know, we get around it in trouble, or visual output requirements, so maybe you've done this processing and you want to send it out to a spreadsheet, but whoever it's going to needs it to look a certain way, it's like, well I'm used to looking at it like this, and not like this, or I, you know, I want there to be borders around it so that I can see it what have you, again, with the style attributes that are available on the cells, you can do things of that nature, you can do things with the ranges, there are a lot of different options, making new friends, I talked about this, if you can help co-workers automate their work, you might be able to make a new pythonista, if you can sort of help them see, okay, hey you know what, there's actually a thing that you can do, so we can, because again, the example that I did was a thousand rows of data, and that's not super impressive, when I started teaching myself to do this, I was working with between 12 and 15 thousand rows of data, and it was being filtered and copy pasted manually into a thing that took like six or so hours to do, writing the scripts to compress that, took it down for about 90 seconds, and so if you're having to do that kind of thing every week, or every day, or every month, with 100 files like that, or a thousand files like that, then this code becomes more useful, so if you can show people this sort of thing, you can make new friends, maybe make a new pythonista, again, would they be willing to sort of standardize their spreadsheet in a certain way that would work better with code, if you get a 99 percent time reduction, that's literally what I did the math, it was like from 240 minutes to two minutes, well that's a 99 percent reduction in time, so would someone be willing to change the way they do their spreadsheet a little bit, to make that 99 percent time reduction happen, and again you can use the style attributes to create the look that's necessary, that's all I've got, if you've got questions or comments, feel free to ask me now, or while I'm here, I am large and easy to see, I am transition on Twitter, there is my website which is terribly underutilized, and the link to my github repo that has the ipop i notebook that I was demonstrating as well as the spreadsheets, you can take a look at those, so that's all I've got, thank you for your time, you have time for questions or maybe time for a couple questions of, okay, let me start, that I don't know, but I do know that Google has an API that will work with Python for Google Sheets, so I don't know that this library will be with Google Sheets, okay so, okay so the question was, good question, is there a way to make it run a macro, so I'm assuming you have a pre-existing macro, and you want to write code that will run that macro for you in Excel instead of rewriting it in Python, the question is, the answer is I'm not sure, but I recall in looking at some of the attributes, there were things about VBA code and VBI archives, so that might be possible, the other thing is, and I didn't mention this in my talk, there are basically three ways to work with spreadsheets in Python, method one is the one that I use, which is to write code that interacts directly with the Excel file and just ignores the spreadsheet program, it just looks at the file, another method is to be able to run a Python code inside of Excel, and there are plugins, I think something called Excel Wings that will let you do something like that, so you can write functions in Python that will run like macros, the third option is to be able to try to script the application itself with Python, so it sounds like what you're looking at might be something along like that second approach, maybe so maybe I would say look into, it's called Excel Wings, I think Wings but with a Y, that might be a way to do what you're trying to do, yeah so you can, so the question is, instead of working with plain text, can you work with formulas, and so the answer to that is yes, I didn't sort of go into any of that here, partially because when I read the file in initially, I said data only equals true, and so here I was only getting calculated results of any formulas that existed, but if you want to work with formulas, that's the thing that you can do, so you can both read formulas in from a spreadsheet, and you can also write formulas out, actually if you look at the OpenPAI Excel documentation, they give an example of that, of just writing a formula to a cell, we got three more questions, four more questions, a lot of questions, okay, let's see, I can't remember who was first, okay I'm sorry, the general the green shirt, then we go to the guy in the back, and then I think I might be done, so OpenPAI Excel, I was, I wouldn't think of it as a competitor to VBA, it's more a matter of it will let you get data out of a spreadsheet, and into some format you can work with in a python, and then it's python, so you can, whatever python can do, you can do with it, that sort of thing, the VBA thing is more an excel wings thing I think, a gentleman in the back, I have it, so the question is, and let's see, the prior question was about, and I didn't repeat it for the recording, the prior question was about how much of a competitor is OpenPAI Excel to VBA, my response was it's not really a competitor, it's sort of a different approach, this last question was about performance, and maybe trying to run this in a Docker container, you could absolutely do this in a Docker container, it's just, yeah, I mean you could just, so I'm actually doing this in a in a virtual environment, but you could do the exact same thing in a Docker, actually when we, when I run this code at work to make our configuration file, our engineer director just put it in a Docker container, and that way you can have dependencies, so yeah, so that's definitely a thing you can do, I just haven't had a need for it, yeah, I think I am out of time and somebody has to, somebody else is coming or something, so, but again, I am large and my head is shiny, so you should be able to find me, and or you can find me on Twitter on the internet, and thank you again for your time, your question.