 Hello, everyone. Thanks very much for coming. Hooray. All right. Thanks for coming to this talk. I appreciate that I kind of mis-titled it, and it's a very boring title. Maybe you think that I'm going to talk about how to output data to a spreadsheet with Python. That's not what I'm going to talk about. I'm going to talk about how to build a spreadsheet application with Python, how to build an alternative to Excel. That's me. My name is Harry. My Twitter handle is the HJWP. Like all Twitter addicts, more followers is cracked to me, so be followers. My website is abaythetestinggoat.com, where I talk about testing. But here I'm going to talk about magical Pythonic spreadsheet. So that's what we're going to do. Does anybody here know how a spreadsheet works? How a spreadsheet calculates the functions you put into it? Yes, sir. You do? Anybody else? Good. All right. So you at the end can tell me whether I gave a good description. And everybody else, my plan is to try and demonstrate that making a spreadsheet is a little bit easier than you think. And we're going to try and build one up step by step. So first, I'd like to take you back to, let's say, 2005. It was a simpler time. There was no, or Facebook had just started out and just got out of universities in America. MySpace was still all the rage. I had a MySpace page. Britney Spears had just released her seminal comeback album, Toxic. And of course, it was a great, great year for the beginning of the North American ragged jungle renaissance, if you don't like Britney Spears. So in this time, a series of programmers got together and they had this idea that everyone loves using spreadsheets, but actually working with them sort of sucks because you have to use VBA. And wouldn't it be great if, instead of having to use VBA when you want to script your spreadsheet, you could use Python? Wouldn't it be great if there was a Pythonic spreadsheet? So these guys got together and they had this crazy idea and they went off and they built a Pythonic spreadsheet. It was a GUI app. And then four or five years later, I joined the company as we were just re-implementing this app in a web-based form. Come in, you're not late. The conference is late. It's lovely to see you. Hi, Paul. Hi, David. So we're going to build then a Pythonic spreadsheet with our colleagues here back in 2010. And it's going to be a web-based tool. And what I want to show you is take you step by step through how can we build a working spreadsheet with Python starting from scratch. And it's going to look a little bit like this. So first of all, we'll assume that the GUI is a solve problem and that we can make a two-dimensional grid like this with all clever JavaScript that's going to allow the user to interact with it. And we're just going to build the back end, the engine for recalculating the spreadsheet when the user does stuff. So here's a spreadsheet. I want to be able to do things like type into it. Yes, it's a talk with live demos. It's going to go wrong. I'm going to type in things like this. Sorry, ignore that. So I'm going to type in two. And I'm going to type in, and three. Fine. OK, so what have I got so far is I've got a sort of two-dimensional grid. And if I want to store this data, I'm going to be able to say, OK, well, a1 is 1, b3 is the string, and 3. So we'll start off nice and easily. That might look, I'm going to propose to you something like this. We're going to have a dictionary. It's going to be indexed on a tuple of row number and column number. And that's going to contain a cell object. And that cell object is going to say, oh, I'm 1. I'm 2. I'm 3. So far, so good. Hooray! Round of applause. Yeah. You guys, you guys, you guys, you can't just applaud when the speaker asks for it. That's like super cheap. But I'm the cheap one. I'm not you guys. All right, so that's a pretty useless spreadsheet. Let's see if we can't make it do something better than that. Like over here, what if I want to go equals 2 plus 2? A spreadsheet should allow us to do some maths. OK, so who do you think is going to happen when I press Enter here? No, it's going to work. Hooray! 2 plus 2. So it works. That's not too bad. And notice what I've introduced here is that there's a difference between the formula. The formula is equals 2 plus 2. And then the result, or the value of the cell, is 4. So now I've introduced a cell. It's not just a bit of text. It's also there's a distinction between formula and value. And that might look a little bit like this in code. Let's say I'm going to have a cell class. It's going to have a formula and a value which we initialize to a sort of magical undefined special variable. And then when we want to calculate the worksheet, we just go and find all of the cells in the worksheet. Don't look back at your slides. That's in my speakers' tips. Are we going to go through all the cells in the worksheet, and we're going to see, ha, does this start with it equals? In that case, it's a formula, and I have to do something special. Otherwise, the value is just what the user entered. So what special thing can I do to get 2 plus 2 to turn into 4? I can basically just do eval, cell.formula. Room answer, just call that. And so 2 plus 2 is going to turn into 4 because I'm going to call eval on end. Hooray, eval statement is the best thing about Python. They never go wrong, do they? OK. Now, is anyone particularly evil that would like me to change the formula that I've put into this cell? Any suggestions? Yes? I could do a yes you can, and you can put it in a string. How about if I do this? Everyone wants to see this, right? Error, oh, hang on a minute. I thought of that, too. Somehow we have to handle this. We have to notice when the user does something stupid. We have to catch some errors, maybe give them a nice little trace back and show them a little warning error. So division by 0. So how could we do that? OK. Well, we're going to put try accept, classic. We're going to call eval cell.formula. And then when we catch an exception, we're going to go and populate that error on the cell instead of calculating its new value. Fair enough. OK, well, so hooray. Well done. I've got a spreadsheet that's basically a two-dimensional calculator. None of the cells can talk to each other. This is still not a spreadsheet. It is still no better than a calculator. So what we'd really want to do is maybe be able to refer to other cells in the spreadsheets, right? In my things, I want to be able to go something like a1 plus a2. Anyone think this is going to work? Yes, hooray! a1 plus a2. OK, so what have we done there? We've taken something that looks like a1 plus a2. And we need to somehow turn it into something Python's going to understand. So in a way, we're taking a1 and a2. And we've got our worksheet object, which is a dictionary. And it contains all of the cell objects. And basically, we want to translate the sort of string a1 and the string a2 to become some valid Python. So if I manage to turn a1 into worksheet11.value and worksheet12.value, then I could call eval on that. And so the way we're going to do this is we need to transform things that look like Excel formula, things that include cell references, into things that look like that Python can understand. And we've already got our worksheet object to refer to cells. Does that make sense so far? OK, all right, so what are we going to do that? OK, fine. We're going to have a little formula, a little setter for our formula. We're going to say, hey, if it starts with an equal, then we're going to go and parse the user's input and other one. And even, I'm going to show you that you might get a formula error. So if there's a syntax error in their formula, you can put that in there. And meanwhile, you can transform a user's formula into a Python one. Does that make sense? Now, would you like to see some of the magic of parse to Python formula? Enthusiastic, yes. Yeah. Yeah, yeah. So this is the first little bit of recursive fun. And I thought I'd present it using some tests, right? So we want to be able to say equals1 should turn into 1. Equals1 plus 2 should turn into 1 plus 2. EqualsA1 should become worksheet1.value. And then you're going to have crazy formulas in your things. You can have x times A1 for x in range five. That is a valid formula that you can enter into this Python spreadsheet. And that's going to turn into this. So all of these sorts of things will happen. I'm not going to go into the details of the parser. Parsers are parsers. This is a special one that knows how to understand Excel and turn it into Python. But you're going to have some recursive fun. You're going to look at a node. You're going to say, hey, if it's a cell range, we rewrite that. If it's a cell reference, we rewrite that. And then we're going to call the parser the rewrite function on each of the nodes inside it. So if you've got a cell reference or a cell node, you've got some children. So that's the first bit of recursive fun. You parse things. And you look at your A1s. And you've transformed some A1, A2s into a valid Python. OK. So that makes sense. But we're not really finished with that job. So we've got equalsA1 and A2. What if we have something like this? I'm just going to go around. Sorry, I should have opened this before I started. What if we'll, I'll just show you on this spreadsheet instead. OK. So I've got equalsA1 plus A2. And that's fine. Now what if over here I have another one, which says equalsB3? Well, now I can't just evaluate these cells in any old order. Because before I can calculate this one, I need to know that it depends on this one. And then this one depends on these two. So what I've introduced is a kind of dependency graph. That's the fundamental structure that underlies a spreadsheet. You have a series of cell references. Cells point to each other. And those are the dependencies. And in order to actually do my calculation of my spreadsheet, I'm going to need to know what that graph looks like. So I can calculate the things that have no dependencies first. And I calculate things that depend on them. And then I calculate things that depend on them. And I do it in a sane order. And so we had a little bit of recursion in our parser. And now the real recursive fun begins. Recursion, it's the only truly fun thing about programming. We're going to do something a little bit like that. We can parse the dependencies out. We've used our parser to recognize A1s anyway. So we can also say, oh, OK, that I can find for any cell formula. What other cells does it depend on? I can have a little function to calculate a cell. I'm going to go, OK, eval the cells Python formula. And now I'm going to be able to build a dependency graph. And this is going to say, what order am I going to be able to re-evaluate my cells in? If I build a dependency graph, find out everything that depends on everything else. I can find all the leaves in that graph. I start with the leaves. I pop them off this sort of queue. And each time I calculate a particular cell, I can then remove it from its parents. Does that make sense? We're going to say that when you've calculated a cell, the things that depend on it don't depend on it anymore. So you remove yourself from the parents. That means that we did have a graph that was a one-way graph. We knew what every cell depended on. But now we need to know who and so we know the cell's children. And now we also need to know the cell's parents. So we need an algorithm for parsing a one-way graph and turning it into a two-way graph. And let me see if I can explain that to you. So build a dependency graph. We generate a cell subgraph for each of the cells. And we're going to parse in these arguments. We look at the worksheet. We keep track of the current graph, which is the two-way graph instead of the one-way graph. We keep track of the current location. And we keep track of what things we've completed already. So like any recursive algorithm, the very first thing you need to think about is what's the exit condition. The exit condition is a thing that you come across and know that you've already done, in which case you can exit out. Fantastic. Next, we look at our cell. We know who its children are. So we're going to add them into the graph. And then for each of those children, we're going to recursively call the same function to go and do the same thing. So we do the children. And then we recurse down into each of the children. And once we've done that, we've completed that particular cell. Adding the children involves creating a node for the parent. And for each of the children, we create a node for them. And we say that you are in the parent-child relationship with it. Does that make sense so far? Hooray. Who now? That's quite a lot of hard work. You have to wrap your head around it. And I'm pretty soon you'll be thinking, hang on a minute. What if I have A1 depends on A2 and A2 depends on A3. But A3 depends back on A1. And I've got a circular dependency. So I can't quite do it like that. I'm going to have to track the current path that I've taken through the graph as I'm recursing down into it. And if I spot that the current location I'm at is already in the path that I came to in the current stack of the recursion, I'm going to raise a cycle error. And that means that I need to catch the cycle errors when I make the recursive call down into the algorithm that's looking at each of the subcells. So that is going to make sure that I can also catch cycle errors. Still OK, everyone? Yep. OK, fantastic. Who knows a better way to do this? It's RatedTool? Not sure about that. You can do a thing, there's a thing called NetworkX, which is a network analysis package. And you can give NetworkX a one-way graph, and it will just give you back a two-way graph in two lines of code. So a lot of wasted effort and hard work on the part of the dirigible spreadsheet developers. Yeah, but it was character building. We enjoy recursion. It made us joyful. So fantastic. All right, so far so good. All right, great. I've now got a little thing that can go, OK, B3, A1, A2, A3. Now, what if I wanted to go into my spreadsheet and actually start having some proper Pythonic fun? So if I wanted to define some custom functions and now start using more Python in my actual spreadsheet. So if I wanted to go deaf, foo of, like, say, X. And we're going to return X plus 42. All right, great. It'd be lovely to be able to use this function foo inside of our, what should we call it, spreadsheet, right? Am I here if I could do foo of 3? Do we think that could work? Yes! Hooray! All right, how are we going to get that work to work? We've now got two sets of user inputs. We've got all the values and all the formulas that they've put into the spreadsheet. And we've got also some custom user code that they've put onto the right-hand side. So what we're going to need to do then is we've got two sets of evaluations. First, we need to evaluate the user code. And second, we need to evaluate each of the cells. And the way we're going to do that is we're going to start isolating our eval calls from the global context, which is probably something we should have done ages ago. And we're going to call eval the user code, and that's going to populate the foo function into this context. So a context is just a dictionary. All of Python's namespaces are just dictionaries. Namespaces aren't they great? And then I'm going to pass that same context to the eval context of each one of my cells. Hooray! I've now got custom functions. But I am sure I hear you ask, what if we want to write a function that actually can access things that are already in the spreadsheet? So like supposing, rather than having a function foo, I'm going to have a function that say, have I got it over here, is going to like say, sum everything in row A. And I'm going to want it to say, okay, well look at things that are already in the spreadsheet. So I can't evaluate my user code before say I've already done a cycle of loading some of the constants in the spreadsheet. So I can always load the constants if they've got, if I know, yeah, if something is a constant, I don't need to evaluate it so I could evaluate, before I evaluate each of the cells, I can evaluate some of these functions and then I'm going to have something like this. All right, so I'm going to say, load all the constants, then eval my user code, so the user code has access to the constants and I can have a function that looks at what's already constant in the spreadsheet and then evaluate the formula. But as I'm sure you're wondering, what if I wanted to write a custom function that can access the results of evaluating the cells? Well, then I'm going to have to say, maybe I'm going to let the user input two types of user code, one to be run after we load the constants but before we evaluate the formulae and one to be run after we evaluate the formulae and sure enough, that is what one can do and you're going to do something a little bit like this. So you've now got a load constants function, you've got an eval of the user code pre-former evaluation, you evaluate the formulae and you eval the user code post-former evaluation. Who is still with me? Hands up. Okay, almost everyone. So that's fair enough. So here's the real one. So that would mean then I would have to put two little code panels over onto the right-hand side and hopefully, this is the bit of the presentation where something slightly magical happens. If we have a look at this bit of code here where we're doing, okay, let's build a context. We add the worksheet to it. We to call this load constants function, we call the evaluate formulae function and then we've got this sort of user code that's kind of like before and afterwards. What if instead, you had a thing, oh, incidentally, you can now, if you put things into the pre-former evaluation functions, you can actually put formulae into the spreadsheet from the user code panel before the formulae evaluated. This is a certain, I mean, this amazing Python spreadsheet is a surefire set of guns pointed at both feet at the same time. It's brilliant, okay. So what about, this is what we're doing right now and then what the user actually sees when they log into a dirajor spreadsheet, and then I'll start a brand new one for you, is they see this. The user code panel is pre-populated with a function called load constants, which you're noticing is exactly the same name as the function that I used in the actual evaluation of the spreadsheet and it's got a function called evaluate formulae, which you might remember was the same name as the function we're gonna use in the real evaluation. And so here what I'm gonna do is I'm basically gonna turn the whole spreadsheet on its head and say that the spreadsheet is the user code and I have my load constants evaluate formulae in my calculate function, I take my context, I put the worksheet into it, I put the load constants function into it, I put the evaluate formulae in context, which takes the worksheet and carries it into my normal evaluate formulae function. And then all I do is exec the user code and it's just your user code panel that loads the constants and evaluates the formulae and all of that. And that means that you can do totally crazy things in your user code, like you can evaluate formulae multiple times or you can put nested recursive calls to the spreadsheet itself, you can make spreadsheets that call other spreadsheets, you can populate formulae's time programmatically, all that sort of fun, the spreadsheet is the user code, the user code is the spreadsheet, this is the most pythonic spreadsheet available, am I telling you that you should use this spreadsheet? Absolutely not, if you wanna work with spreadsheet type data in Python, just use an IPython notebook and pandas, this is just a little bit of fun that I thought you might be interested in. The dirigible source code is all available on our github at github forward slash python anywhere forward slash dirigible spreadsheet, if you fancy taking a look at it, I've now given you a tour about it and hands up, please, if you found that easier to understand than you had thought it was going to be to understand how a spreadsheet, wait, no, hands up if you now understand how a spreadsheet works. That is like a 90%, I'd say. Hooray, and who thought that was easier than they thought it was gonna be before they arrived into the room that they thought that they thought that the spreadsheet reevaluated what it was gonna be? Ha ha ha ha ha ha ha. Irish just held up a sign saying, are you kidding? Ha ha ha ha ha ha ha ha. So hands up if you thought that was easier and simpler than ever, one person. All right, that's good enough. No, no, no, please, hands up if you genuinely thought it. And if you thought that was really confusing, there you go. Hands up if you think I should never do this talk again. All right, so by implication, that's two people putting their hands up, incidentally, any program working groups that are considering talk submissions for future conferences. Two people put their hand up, which means about 200 people in here think I should do the talk again. Thank you very much everyone, good night. All right. Okay, we'll have four minutes for questions. Oh, hooray. Hi, did you think about security? Yes, we did. So the, obviously that was one of the major things we were giving this out to random strangers on the internet saying, hey, will you eval your Python code on our servers for fun? So yes, we have a sort of sandboxing model that allows, make sure that each of the users can only access a restricted part of the file system and basically the whole containerization story that you've heard to death at this conference already, no doubt. That's the story. I have stripped all that code out of the dirigible that's published online, although you can find it in the history if you're interested, because it's easier to demo and easier to understand without the security stuff. But yes, you can. Good question. Okay, maybe one more quick one. Thank you for the talk. It's more common than a question. Well, to be serious, when I look, my colleagues work and they work a lot of time with the Excel, I think it could be some kind of killer application or serious because there is some, we can work with pandas and do it at this level, but during our work, often people really want to change single cells among 1,000 cells because of some special thing they just heard or whatever. And if this is easily trackable and expandable with Python functions, I think it's really it has potential for really serious or useful tools. No, you're right. I mean, the spreadsheet is a wonderful tool. I wrote a thesis about it for my masters and that whole sort of instinctive two dimensional, I can see the numbers I can change and I can visually represent relationships. It's all wonderful stuff, but it turns out nobody wants to use a spreadsheet other than Excel. And so as soon as you try and build a competitor to Excel, everyone's like, oh, I want every single shortcut key that Excel has to also work in yours, okay? And you're like, that's quite a lot of work catching up with like 20 years in Microsoft. So however, any IPython developers, IPython notebook developers who want to integrate some sort of spreadsheet component, we'd be happy to like point you around the dirigible code base and see if there's anything in there that you can use. There you go. What time do we have to finish, Iris? Two minutes. Two minutes, hooray. That's time for another question. There you go. Again, hooray. That's someone really engaged. Everyone else is like, how long before we can get? Maybe a weird question for a Python conference, but did you think about implementing this in JavaScript? So the straight answer to that is no. But I'm thinking about it now. Yes, one of the interesting things about it is that you run things on the server. So remember when we executed that, we looped around all the leaves in the spreadsheet and we had a sort of queue of them? It's very easy to start parallelizing that. And then you can do things like recalculate your spreadsheet across a cluster of machines because once you've got a leaf node, that's totally independent from all the others. So we were thinking maybe massive parallelization would be an interesting market spot to be in. You can't really do that with JavaScript. Is that the right sort of answer, Charles? Yeah, yeah. There's no JS then. And no JS didn't exist back then. I know, but you were thinking maybe that you'd have the whole thing on the browser. Well, mostly to run it all locally. Yeah, all in the browser. I'm sure you could rerun it. All Python to JavaScript is pretty easy to rewrite. Knock yourself out. I'll send you a pull request. OK, exactly zero minutes. Let's rush to the lacking talk to see more Harry's performance. Yeah, I do have another job. Thanks very much for coming, guys. There you go.