 I'm going to talk about automatic conference scheduling using pulp and a few words about myself. I'm Mark Lamberg. I've been doing Python for many, many years. I studied mathematics. I live in Düsseldorf in Germany. I'm a senior consultant and I did a lot of work for the Python Software Foundation, the European Python Society, there's our talk manager, and yeah, I'm the chair of the European Python Society at the moment, and so I'm very busy organizing this conference. Right, so what I wanted to talk about here in the talk is linear programming. How many of you know linear programming? Quite a few, that's good. So linear programming is a term that comes from operations research, which is a field in mathematics, so it doesn't have all that much to do with computer science, and what they call linear programming is actually something that has to do with optimization. And so programming in the sense means that you want to find some optimal solution for a problem that you have. Linear of course means that the problem that you're looking at has to be representable using linear functions, so both the constraints that you have, the side conditions that you need to fulfill plus the function that you actually want to optimize, they both need to be linear. This is something that's, it's very easy to understand, it's typically easy to write down, but finding optimal solutions can be extremely hard, so if you know from computer science, these problems are usually NP-hard, so they have exponential run time. Okay, my laptop is going crazy. Right, so let's look at mathematics behind this. So what you have when you define one of these problems is you have a set of variables, the XI variables you see here, those can be usually floats or integers, then you have factors that you add to these variables and then you add everything together and then you say, okay, this is my objective function, so it's the function that you want to optimize, or you can also use this kind of method of writing down things for writing down your constraints, and I put up some pictures here, those are the constraints, so essentially what you do is you have one linear function and then you define multiple ones of those and you say one side of this linear function is supposed to be in the set of allowable values. Should I just plug in again, maybe the HDMI, yeah? Yeah, okay, no problem. I hope my notebook won't crash because of this. Can you see something now? Is it working now for you? You saw that, okay, perfect. Well, you can see something, that's the main purpose of the talk, right? So I've drawn up some constraints here, you can see this yellow area, this yellow area is the set of allowed values and of course what you want to do is you want to, in that area, you want to optimize the function that you're interested in. So let's turn to Pulp. Pulp is a Python implementation which provides an interface to these solvers for linear programming problems and Pulp itself is part of a larger library called Coin or R, so they have various tools in there which are usually very interesting to use and it's all written in Python and if you're doing, if you're solving problems in this space then this is definitely something to look at. Pulp itself, it's basically a front end to LP solvers but it also comes with a, it's a slow solver but it comes with a solver that's completely written in Python and so it's great for doing debugging. Usually what you do is you debug your implementation first using the Pulp solver and then you switch to one of the more advanced solvers like GLPK which is a GNU one, it's a free one and then there are some commercial ones. Obviously this is an area where there's a lot of commercial development going on because writing these solvers is usually very hard and it's very tricky and they're usually very expensive if you want to pay, if you want to buy one of those. The way that you define your problem in Pulp is you have a few objects that you have to set up. So these are the class names I've written down here, so LP problem is the basically the object that you're going to use to basically group everything together. So this defines your problem and it has references to the variables that you're using, the objective function and how you want to optimize things. Then you have to define the XI variables in Pulp as well and you do that using the LP variable class. So you create one of these objects for each XI that you have in your problem. The LP variable objects can be set to say okay, this variable is a float or this variable is an integer, there's also the option of having a binary variable so it just takes values zero and one and it's possible to define a permitted variable value range, which makes it a bit easier so you don't have to define as many constraints as you normally do. Then you have the constraints, there's a class called LP constraints for this and with that variable you can then define using those LP variables, sorry, with that object you can define your constraints using the LP variables and when defining these you basically define a short formula and then the solver will take that formula and then use it in the process of finding a solution. Pulp also has some more advanced LP constrained subclasses for example for elastic constraints, so you can say instead of just saying, okay, the value has to be on that line or in the yellow field, you can say, okay, it's okay if you go like say 10% into the white field, which is handy if you want to do debugging because sometimes you set up the constraints and the solver says, okay, there is no value that is allowed. So obviously you have some problem in there and then you need to make it more flexible, more elastic and you can use those to figure out what's going on. Okay, so what about the documentation? So when I started using Pulp, of course, the first thing I did is I tried to look up the documentation. I found that the documentation is a bit lacking, so it's incomplete, it doesn't actually document everything. It misses some details, there are some tricky things in Pulp that you have to know which probably come from the OR space and if you're not really into OR, then it doesn't feel as pathonic as it may, well, would help for using something in Python. So the best thing that you can do is you just look at the source code, which is really easy to read and there are a few blog posts that are helpful as well because other people, of course, have had the same problem and they figure things out and then wrote things down in the blog post. So what was the inspiration for doing this talk and for looking at conference scheduling as an example? That was a talk from David McEver at PyCon UK last year. There's a video of that talk here. And of course, I was looking for this because of EuroPython, because before EuroPython 2017, we always used manual scheduling, which is a lot of work because we have around 200 sessions to schedule and so I was thinking that maybe this to make things a bit easier to simplify things, to also take criteria into consideration, which we usually basically have to do by hand. So like speaker preferences or like attendee preferences. So we look at the statistics that we get like from talk voting, for example, and then we try to use that to assign the proper rooms. Typical things you have to do in conference scheduling is you have to look at the room sizes and you have to guess basically how popular talk is going to be. Then you have talks which have varying durations. So the talk lengths are different. The slots that you define in your schedule are usually different. So you have longer ones, shorter ones. And then you have some other constraints like, for example, a speaker may only be available for a few days of the conference, not the whole conference. Oh yeah, I forgot one. A speaker cannot give two talks at the same time. And this is something that's very important because we made a mistake there this time. So let's see how you set up a problem in Python. So first of all, you define a few variables. I use dictionaries for this. So you define the rooms, the room sizes, how many seats you have in that room. You define the talks and the talk durations. Then you define the talk slots that you want to use in your schedule. So this is always ordered in as room A, for example. And then the start time in minutes from the beginning of the day and then the length of that particular slot. And then you can go ahead and start with defining a problem. In this case, it's a very simple problem. And you want to maximize it. So you tell it to maximize the objective function. Then you have to create variables. So the way that it works is basically you're not actually optimizing an objective function in this case because in the example I'm not using that. So basically, the objective function is a constant function. So maximizing basically doesn't mean anything. You're mostly interested in finding a solution space, somewhere, a possible solution or a possible schedule that will fill all the constraints. And so what you do is you set up variables which say binary variables which say, OK, the talk on the left in the tuple is assigned to the slot on the right. And you do that for all the talks and all the slots. And this is basically the problem that you want to optimize. You want to say which talk gets assigned to which slot. Then you, of course, have to use the constraints to make sure that you get a complete solution. So you want all the talks to be in your schedule. And this is how you do it. And this is a very typical pattern. So you take the sum of all the assignments that you have for the talks. And you say, OK, this needs to be smaller or equal than 1 because you have binary variables. This essentially says that you need to assign, sorry, this is a different one. This is actually the one which says that you can only have one talk per slot. And the second one is this one. This is we need to assign all slots. So you take the sum of everything and you say, OK, for each talk, the sum of assignments has to be 1 over all the slots. Then you need to make sure that the talk durations fit the slot lengths that you have available. So you do that using the second one. So again, you sum up all the assignments and you take the slot duration. And then you make sure that the sum of all the slot durations where this talk was assigned is equal to the talk duration. Is this clear so far? Yeah, that's good. Then some speakers may not always be available, like I said. For example, this speaker is only available later in the day, so he usually uses, I don't know, takes longer for breakfast or something. So what you say in this case is you add constraints which basically say, OK, the talk may not be assigned in slots which are too early on that day. So let's say Friday after the social event, right? You don't want to do your talk, you do something like this. So you make sure that this assignment is zero. And then we get to a more complex thing. So you want to make sure that the same speaker is not giving the talk in two places at the same time. This may sound easy at first, so you just say, OK, you just need to make sure that in the same time slot, you don't have only assigned one talk of that particular speaker. But the problem is that the slot durations are usually different throughout the day. So if you look at the schedule, for example, then you can see you have the slots are sometimes longer and shorter, and you have 30-minute talks and 45-minute talks. And then you have trainings, workshops, and so on. And what I marked in red here is the case where we made a mistake in the printed version of the booklet. We assigned Radomir to two slots happening at the same time. So this is something that we want to avoid, of course, because it's fixed now. But in your printed brochure, you can still see this. So how can we handle this case that we have different slot lengths, but we still want to make sure that we don't have any overlaps? Because what can happen, of course, is that the scheduler says, OK, I want to have the speaker do a talk in room one early on, and then just maybe half an hour later, I want another session to start with the same speaker again to start with in room three. So the way that you solve this is you find you basically split up all the slots that you have into smaller blocks. So you take like 15 minutes blocks, for example, which is something that works for your Python. And then you split up all the slots into these blocks. And then you make sure that the speaker never gives a talk in these slots, which happen at the same time. So you make sure that this case does not happen. So what you have to do here is you have to define new variables. So you define the talk block assignment. And then, of course, you need a few helpers, like not going to go into details here. This basically just says, OK, this block is assigned to that slot. And this block starts at that time. And it makes it easier to then define the constraint, which then looks like this. So the first thing that you do is you tie the blocks to the slots that you have to assign. So you say, OK, for example, the light green one on the right in room three, this is three blocks. So you make sure that those three blocks are assigned in the same way as the slot. And then you go ahead and say, OK, for each start time that I have, I want to make sure that the speaker can only be assigned once to that particular start time and block row that you have in your schedule. So this is how you solve these things. And then, of course, you go ahead and you add other constraints you might have. You define your objective function. In this case, I commented that out. So this would be like the happiness function of that particular assignment that you created. You have to define the solver. You can use the PULT one, which is the one that is used if you set solver to none. But you can also use, for example, the GLPK one or one of the commercial ones. And then you run the solver, which is just the problem.solve method call. And then it's important, and this is something I find a bit un-Pythonic about Pulp. You have to actually check the status of the solver. So in case it fails to find a solution, it doesn't raise an exception like what you'd normally expect in Python to happen. It just sets the status to something that's not one. And it took me a bit to find that out because I thought I got a solution because it ended and didn't get an exception. And so I looked at the solution and it didn't look right. So this is something to keep in mind. Right, and then, of course, you want to show your results. So you basically pretty print the results in some way. You have to then access these PULT variables that you define, and you do that using PULT value. And then you just pass it the LP variable value. You have to use this way of doing it so you cannot, for example, just write if and then assigned because for some reason, those LP variables are always true and so this doesn't work. So you have to use this approach to get at the value. Right, and then you get a solution to the problem which in this case looks like that. It's not, this is not very pretty. Of course, you want it to be printed like this so you define some extra helpers for that. And then you can use this on your website. You can then inform the speakers that you found a solution for this and then maybe they come back to you and say, okay, well, this doesn't work out because of this and that. And then you add extra constraints to your solution and you just rerun your solver. Of course, what happens when you rerun the solver is it doesn't always give you the same solution. So it's very possible that it gives you a completely different solution the second time you run it. Even more so if you add other constraints. So what you have to make sure is you have to tell the solver to minimize the changes in a schedule that you've already published. And the way to do that is very easy. You just take the existing schedule that you have the existing assignments that you have made and then you add a penalty function. So every time something changes compared to the old schedule that you've created, you raise the penalty value and because it's an optimizer, in that case, of course, you have to use an objective function. Because it's an optimizer, it will try to reduce the penalty value. So it will make as few changes as possible to get everything done. That's one thing that you should probably do when scheduling using pulp. Another thing that you can add is like room assignments. You can use the talk voting results, for example, for that to figure out which room sizes are needed. And you can add that information to the problem as well. You can add tracks so you can tell the scheduler to optimize things in a way that all the topics, the detox of a certain topic, like PyData, for example, are grouped in a way that they happen on the same day and maybe even in the same room, so people don't have to switch. And of course, as always, I mean it's rather easy to just write down everything but basically translating your problem that you have into one of these LP problems is not always, it's not always easy. So you have to think a little differently because it's not programming as we normally do in Python. It's basically declarative what you have to do. So it's more like SQL programming. And so figuring out how to actually write things down is not easy. You have to, you're often run into the situation that the server simply says, okay, there is no solution. So you're basically stuck at that point and then you have to start debugging. Pub does provide a few features for that so you can give names to all these variables. For example, then you get a debug output. You can see where things did not go right and then you can figure out why there was no solution. There is something to consider though in all this, the runtime. And this is a major problem, especially for conference scheduling which is essentially an NPR problem. You can easily run into the situation that as you add more variables to your problem, the runtime increases exponentially and so it takes ages for the solver to actually come to a solution. And then after three days it may tell you, okay, there is no solution, which is of course not very helpful. Right, so in this slide I just summarized basically the few gatchas I already mentioned. Basically the conclusion here is that you should always test drive your solvers. So just use it on a very small problem, make sure it all works, write test cases, make sure that it gives the proper results. Do check your constraints afterwards because in some cases the constraints may, well the solver may have a bug or something and you don't want that. If you want to have things run faster there are some other options that you can use. So I wrote down a few options here. These are solvers which are actually extend things beyond just linear so they can do convex optimizations. So as long as you have a convex function then you can still run the optimizer and find good solutions. This one, the first one is a very fast one and I haven't actually looked at this yet but I think this, someone has, you can see the blog post there and he's basically got results from 10 to 70 times as fast as pulp, which is promising. And I'm almost done. So, and then there's another one which is for, down here for conic optimizations so you can use those as well if you want things to run faster. Right, so what did we do in particular for EuroPython? Well first I wanted to mention that now there is a PyCon after this talk at PyCon UK someone actually sat down and implemented this for PyCon UK. There is a project called conference scheduler and for EuroPython we actually did look into this and try to use this. Unfortunately we ran into exactly the problem that I just mentioned. It has exponential run time and so it took three days to run and basically we just then killed the process and did not wait any longer. So then Alexander who is the chair of the program work group he basically then decided to write his own and what's interesting is that he didn't use an optimizer. He just used some clustering and random shuffling and then just some basically when like the experience from the previous years to make things happen and this is a lot better and it actually works out pretty well. Right, that's all I wanted to say. Thank you. Okay, thank you, very interesting. Yeah, sorry I screwed it up but I have a good excuse. I think Mark is one of the only speakers we don't have to introduce, so yeah, sorry again. Are there any questions? What is your thought on viewing this specific problem of conferences as a hard mathematical problem versus an art where let's say one conference might decide to put all the headliners in the same slot just so that other speakers would not be robbed of the audience while another conference might want to spread out the headliners so that people would actually go and see the headliners. Well, I mean, you can put all these things into constraints when you do this automatic solving. Of course, when you do like what we do at EuroPython where we put in a lot of experience and we do have certain things, certain considerations that we always take into account when doing the schedule, you can basically try to make everything work out. So, what you just said, you can for example, just putting the headliners first and then having all the other talks after that you can define that as a constraint. So you can define for certain talks that these need to be in certain slots or maybe in certain, I don't know, happen at certain times. Does that answer your question? Perhaps maybe more generally about whether this problem should be approached as mathematics or as art. Well, it is certainly, it's a bit of an art, yes. You have to, but I mean, art in the sense that you have to experiment a lot to find a good way of actually doing the scheduling. We've been experimenting a lot with these things at EuroPython, I mean, EuroPython, this is the 16th EuroPython that we have. So we've tried a lot of things and some failed, some were good, like for example, the concept that we have now where we have the keynotes first and then we, and keynotes on each day, that seems to work out pretty well. We tried now for a couple of years to have trainings, for example, integrated directly into the schedule rather than having them as separate days. That does work well, but it also results in the conference, the talk days, to be a lot more than what some people actually find good. So we're gonna look into all these things, the feedback that we get, and then we're probably gonna make a few changes going forward. Another question. Hi, very interesting. Tom, when it failed to find a solution in your experience in this, was it easy or hard to figure out what was wrong? Did it help tell you what you had to change to find a solution or was that one of the hardest bits? It's the hardest part, yes. If the solvers says, okay, I cannot find a solution, then you're basically very stuck because if you look at the debug logs that these solvers output, first of all, they're very, very long. They're very hard to read. And essentially, I mean, if you, you usually cannot just do it visually because you have so many dimensions, it just doesn't work out. So you essentially have to basically then cut down your problem to fewer sessions that you need to schedule until you find a working one. And then you can take it from there. You can add more sessions and then at some point you find, okay, this, now it failed and you do this kind of backtracking approach to figure out which constraint is actually causing the problem that you have. Another question. How many conference organizers are here? Okay, which conference are you doing? Sorry? Hi, is it like a hackathon conference? No. No? Okay. Very good. So maybe I have a question. Would it be possible to split it up into smaller problems? Like for example, say, okay, here's the first set and we, for example, treat every day as individual days and assign some speakers to that days because you know the numbers already. You have to juggle some a little bit. Or are there any other ideas to split it into smaller problems to overcome this exponential? Yeah, what you can do as to solve this is to, for example, look at the different tracks and then optimize the different tracks individually. This works. And optimizing for the days would also work, but then of course, I mean, you have to in some way make sure that all the talks get assigned and if you split it up too much, then you can run it to the case that you forgot to assign a talk. Something that happens with the EuroPython a lot is that we get talk submissions, we accept them because they look nice and then we find that people cannot come and we get cancellations. And so we often have last minute changes. And doing that, I mean, directly at the conference and figuring out a new kind of schedule and replacing those is not always easy. And so with help of some automation behind it, I think it's actually a better solution. What we did found is like what Alex found is that completely relying on these solutions does not really make sense. So you can use this as basis, but you should then always take this basis and then work from there mostly manually. Okay, so thanks Mark again.