 Hello, everyone. Good afternoon. Thank you all for coming or for staying. Let's put it that way. It's very nice of you. I basically am just standing here because another speaker couldn't come. He was going to talk about Pi ASN, which is a really nice library. I can recommend it. I've used it myself. It's not the most, I mean, kind of common thing that you do, that you'd decode certificates using that library and then work on those. But it's definitely worth a look. So what I'm going to talk about is something that I presented at a user group meeting earlier this year in January in Düsseldorf. And I just sat down yesterday and basically translated the German talk into English. And so that's what you see here. So this is a talk about pulp. Oh, yeah, that's me. I'm Mark Lemberg. I've been around for quite a while. 42 is not correct anymore. So, yeah, I'm a member of the Python Software Foundation. Europe Python Society was core developer for a long, long time. But I'm going to talk about linear programming now, which is not really, well, maybe not that interesting. How many of you know what linear programming is? Well, actually a few. That's good. So that's interesting. So one of the things about linear programming is that it always gets misunderstood, especially where programmers, because it doesn't have much to do with programming. The term originated from operations research, which is a field of mathematics. So it doesn't have anything to do with programming. Programming means for mathematicians means that finding, giving some planning problem, finding an optimal solution to that and ways of doing that. And linear, because it restricts itself to just using linear relationships between variables and also for defining constraints. Now, linear always sounds very, very easy. In fact, the problems that you can solve that are solvable by using linear programming, they usually are very easy to understand. But finding the solutions is always, well, usually very hard, especially if you have larger problems to solve. So the other thing that we're going to talk about is integer programming. Integer programming is kind of a subfield of linear programming where you have additional constraints. So in linear programming, you can have variables that can have floating point values. So basically they can take any value that you can think of. But in integer programming, you restrict yourself to just integers. So the variables, the solutions that you find, they may only be integer values. And the most common one there is Boolean values. So you just have zero and one as possible values. And this makes it very tricky and you all often have exponential runtime because of that. There are some very, very popular, very common problems like the knapsack problem. You probably know knapsack problem is you have a knapsack like a backpack and you want to fill it up with stuff and you want to put as much stuff into it. And you want to find the optimal way of just putting stuff into the back so that you can carry along as much as needed. Like think of a bank robber here, you know, goes into a jewelry shop and wants to find the most valuable things to put into the backpack. That kind of problem. Then the next one is traveling salesman problem. You have a salesman that has to do a tour through the country and then wants to find the optimal solution to that of which cities to visit first and which route to take and so on. And a problem that I'm very much interested in is conference scheduling. And that's also an example I'm going to show here because I'm one of the organizers of the Euro Python conference. And we have way over 100 talks to schedule every year and lots and lots of talk slots. And doing the scheduling is usually a slot of work and doing that by hand is even more work and it often goes wrong. And so this would be a nice way of doing it automatically or at least helping with it. So let's do some mathematics. The way that linear programming works is you have variables. Those are the things that you can adjust. Then you have an objective function. So you define what value, how much worth your target, your solution is. So that is a function that is based on the variables that you define and then gives you some some value and you want to minimize or maximize that value. Then you have constraints. And this is probably the most important part, especially for conference scheduling. So the variables that you define may only have certain values and using those constraints, you can define lots and lots of different situations. So what we want to do is we want to find an optimal solution ideally which minimizes or maximizes this objective function. Right. So now we come to pulp. Does anyone of you know pulp or no? Yeah, you do. So I didn't know pulp until about a few weeks ago. And pulp is a library. It's written in Python. It provides a standard interface to solvers that solve linear programming problems. It's part of the coin or R library, which is a library, a standard library for doing operations research. And you can see the websites here. You can go there. Pulp itself comes with a solver. The solver in that library is the part that actually does the, well, finds the solution or tries to find it. There are other libraries that you can use that are much faster like GLPK, for example, or you have commercial ones like Cplex. And this other one, I don't know that one. It was mentioned on pulp as well. And there's the URL you can go there. You can download it or you can just pip install it if you want to use it. So it's very easy to get. Use a few data types. So of course you have to tell the program what your problem looks like. So first thing that you do is you create a variable LP problem. And then you stick into that object, you stick the other parts of your problem by using the other variables that you can define. One of the variables, then one of the objects that you have is LP variable. So you set up these LP variables, those X variables that I had on the other slide. And then you can define ranges for those variables. You can say whether it's a float, it's an integer, it's a Boolean. And so you tell the LP problem what your problem is all about. For constraints, you do the same thing. You have objects LP constrained. And in those objects, you add the constraints that you have. So you say that certain variables, when taking the sum of certain variables using some constants, they have to be smaller than some other value, for example, or equal to some other value. And then the library also has some other features that you can use. For example, you can make those constraints elastic so that you can wiggle a bit and maybe then find better solutions. And then you can see that how a certain constraint affects your solutions, your possible solutions. When writing the slides, I found that the documentation was not that great. So I basically had to go to the web and find a few blog posts and then learn using that. Plus, I basically just read the source code to figure out how it was working. So these are the URLs. You probably have to look them up as well. In order to understand it, or what you can also do is you can watch this talk from David McEver at PyCon UK in 2016 last year. That was basically the inspiration for looking at pulp and trying to do conference solving with it, a conference scheduling with it. And maybe for 2017, we're going to actually use it. So let's see, let's take that example and then just try to figure out how pop can help us with that. So what we want to do is we want to make scheduling simple. We want to ideally optimize the speaker or attendee satisfaction. So this basically means that the speakers, they want to have nice rooms. The attendees want to see all the talks that they want to see and with no overlaps ideally. Of course, we have lots and lots of constraints. These are just a few that I mentioned here. So we have multiple rooms of different sizes. I'm not going to go into that detail. What we do have is we have talk slots of varying lengths. So you have like 30 minute slots, 45 minute slots, 60 minute slots. Then you have, of course, you have talks of varying lengths. Of course, speakers cannot give the two talks at the same time, which is something that sometimes happens if you do it manually. Speakers also have some availability constraints. For example, some speakers can only do maybe like do a talk on Tuesday or Wednesday and not on Monday when on the first day of the conference. So you want to capture that as well. So how do we do with this? We first, of course, we import pulp. Then we define a few things like the rooms and the sizes. We have a few talks here with the talk durations. Then the next thing is to define the slots that you have available. So you have, as you can see, you have the rooms A, and then you have the start time of the slot and the duration of that slot. And you can see here you have different time slots. So they're not all the same length. And then you start defining this problem by creating this LP problem object and you add things to it. So first thing is you start with your problem object and then you go ahead and you define your first variables. So the model that I'm using here is a very simple one. So basically you have an assigned variable and the assigned variable says that this talk is going to be in that slot. And you do that for all combinations that you have. The solver will figure out it's going to be a binary variable. So you can see that the category is called binary. So it just says zero and one. So the solution will then have zero or one depending on whether the talk is in that slot. Right, and then you have to formulate the constraints that I just mentioned. For example, you can only assign one talk per slot. So you kind of have two talks in the same slot. This is very easy to do. The standard way of doing it is you just take the sum of all the talks that you have in that slot and it should be lower than one or equal to one. And then next we need to assign all talks because of course we want to give all the speakers a chance to talk. The talks must fit the talk slots so the durations have to match between those two. Again, you use the sum function for that. In the first one you say okay, the sum must be equal to one so every single talk must be assigned. In the second one you look at the slot durations that you have and you make sure that slot duration for the slot that you assign to a talk is equal to that talk duration. And then you have the additional constraints like, for example, in this case, introduction to Python, that speaker cannot start early in the morning for some reason. Maybe there was a party the day before, I don't know. And then you add this extra constraint so you basically say okay, this talk cannot be associated with those slots and you do that for all the slots that start early and you say all those have to be zero so they cannot be assigned. Right, those were the easy parts. Now, here's a more complex example. You need to make sure that you can deal with problems like a speaker giving more than one talk. Plus you have the problem of different slots per room and the slots have different durations so you need to make sure that you don't have any overlaps between those. So, for example, you could have the situation if a speaker gives more than one talk that the solver then comes up with a solution which might look nice but in the end the speaker would have to give this two talks in two different rooms. And so you need to try to address that. So, the way that you do this is because if you try to solve it directly so you try to tell the solver that all the different slots that it assigns have to be in consecutive or have to be consecutive because you don't want to start on one part of the talk early in the morning and then in the middle of the day you continue with the talk and then in the evening you're finally done which would be a possible solution. You want to tell the solver that this has to be consecutive. So, what you do is because you have various durations of the slots you subdivide them into smaller blocks and this is what you do here. You define additional block assignment variables and the smallest block is 30 minutes and then you have the slots can have 30 minutes, 60 minutes or 90 minutes in this example. Then you define some helper mappings. I'm not going to go into detail here so basically try just to figure out how many blocks do you have per slot and then you once you have that you can then tie the blocks that you've defined so basically the Legos to build the little larger talk slots that you have and you do that again using a constraint down here so you say that a certain talk slot that you've defined has to be composed of a number of blocks that you have. Using those blocks you can then define the constraint that if a speaker is more than one talk then you have to make sure that there's no overlap between the assignments for that particular speaker and this is what you do here. So basically again you use the sum down here and you add a constraint that the speaker may only be assigned to one block in the schedule at any given time. Right and then you finally done. You've defined your problem and what you do is you just call the solve method of that of the solver and then you check whether the status is one or not. The problem here is that pulp is designed in a way it's not very, it's not done in the usual way of doing Python programs where you raise an exception if something goes wrong which is kind of, it's non-intuitive. I spent quite a long time figuring this one out because I always thought why is this solution not matching my constraints and in the end it turned out that it didn't find a solution but didn't raise an exception. So this is very important to do so you have to check the status and if it's one it found a solution otherwise it did not find a solution and it also gives you a hint of why it didn't find a solution because sometimes you can have constraints that simply disallow having any solution. Right and then you of course you have to output the solution in some way. The way they do this is you take the variables you call the dot value function of pulp to actually get at the value that's stored in those variable objects. This is another tricky thing because I used Boolean variables. I thought that I can simply do things like if this variable is true like what you normally do in Python where it turns out that the variable objects that you define are always true in pulp. So it's not very helpful to get at the actual value so you always have to use this method of getting at the value. That was another gotcha I found in making this. And then this is the output of if you put everything together if you let it run this is more like the raw data. It doesn't look that nice so you can then convert it into a schedule. And this is what you can then actually put on the website and then use in your in your conference scheduling. So basically problem solved for this small little problem. Of course you can do a lot more than what I showed here. One of the things that obviously would have to do is you would have to have. So you usually have different sizes of a big one. We have only have very few attend which is something very popular like what we have here like a Python track. We do that at your Python as well. So you have different topics and your group talks by topic and you try to have all the talks for that particular topic. In one room so people don't have to change rooms all the time. This would be possible to do. And then of course once you've published your your schedule. You're inevitably going to have to make changes like for example this talk the speaker was had to cancel the talk so it couldn't come. So you want to reassign that particular slot to someone else. And then you can rerun the whole solver and have the solver then find a new schedule for you. And obviously the schedule would normally just go ahead and then just reschedule everything. So you'd have a huge number of changes in your schedule which you don't want. But what you can do is you can you can have it minimize the changes. So you take the difference number of changes that you have between two schedules. And then define that as your objective function and have it minimize the number of changes that you want to do in your schedule. So this is this is nice. And this is also why I think this approach is really good for for doing conference scheduling because it takes away a lot of this. These problems that people normally have when doing this manually. Of course it's it's not always easy to find a good model for doing these things like for example what I had to do with these block variables. This is not if you're not used to doing these things like like I am not used to doing it. It took me quite a while to figure that one out. You sometimes have to think really a lot about how to how to do this because the programming that's being done here is declarative. So it's more like writing a SQL query than actually doing declarative imperative programming like what you normally do in Python. And you also have to watch out that you don't have any constraints where you do something like a multiplication for example of variables because that's not linear anymore. So it doesn't you're linear programming. The software will not be able to handle this and will simply just error out. So that's something to consider. It's it can be very interesting finding these things. And once it works it's really nice. So you really feel that as I mean like you have a good solution for something. There's a little downside here if you do this for just a few things like what I did here it runs really it just takes like sub second. To find a solution. If you do this for for larger problems where you have hundreds of thousands of different variables and states and everything then it can easily become unmanageable. And so the model that you choose has a lot of influence on the runtime. And you have to really be you probably I mean I cannot really say because I don't have the experience but I suppose that with more experience you can write better models and then get better run times. I don't know if you know NP hard problems. You know those. Yeah. Really hard problems. Many of these problems are really really hard and NP hard. So like the traveling salesman problem for example. Right. So I just summarized the gatchas here. I'm probably going to put this up on the on the website so you can download it which is basically just a summary of what I just told you. Or there's one more thing I didn't mention the LP variables LP binary variables that you define. I found that during solving they can actually take values that are not within the range that you'd expect. So you get values like for example I got a in one of the runs that did not produce a solution I got minus three years value for a binary which was a bit odd. So it's probably better to test for variables greater than zero to instead of just doing the Boolean comparison. Yeah and the other things I already talked about it's always good to test your solvers as always. So you take a small problem that you can easily match by hand and then you just let it run through your solver and see what that gives you correct solutions and then you use the large problems. Right. So that was pop. You can use these these external solvers that I mentioned for making it run faster but it's there are other ways of making it run even faster. And these are a few projects that you can look at if you want to get more into detail about those. Those solvers are also a bit more capable. They can do more than just linear programming. So if you're interested you can have a look at this one. And that was all. Plus I wanted to give you a short. I mean just a plug here. Foster miss over so I guess this is OK. So your price in 2017 we've just announced the tentative dates and not the the I mean we don't have we haven't signed the contract with a venue yet which is why we we cannot make it definitely. But it's not likely going to change anymore. So the the dates is I can't really see that it's down here. It's July 9 to 16. In summer. It's nice and warm. Not too hot. Whether it's like bathtub kind of style temperature. So that should be fun. We haven't announced the CFP yet. It's probably we're going to probably wait a bit more with the CFP. I mean we're going to launch the website and I think in the next two weeks. We're going to wait with the CFP because we want to have talks. We don't want to do two rounds of CSFPs like we did last year. Because it was it was too much work. People got confused and didn't know how it worked. And so we're going to do it later in order to give people a chance to submit talks that are I mean closer to the conference are really interesting. Because you all I mean you often have new developments come in just a few months before the conference and then you don't have any chance of giving a talk about it. So that's why we're going to wait a bit more. Right. That's it. Thank you. Which one? This one. So CVX opt is about convex optimization. So it's you don't have it just have a linear function. You can you can have a convex function and CVXP why is it that's an interesting project is something it tries to embed this logic for doing the optimization directly into Python. Someone like that. There are other projects that try to embed SQL for example directly into Python and this is similar. And then there's another library down here. Because which is chronic optimization. I've never used that. I just put it on the slide because it was mentioned. So yes. Do you use that for Python schedule. Well like I said I mean I'm not in the program work group. So I really I mean I don't have anything to do with the schedule links. But I I'm probably going to suggest using this or something like this. Maybe maybe I mean for this year. Maybe we're going to do it next year and do it this year manually. So we have to see. Well thank you very much. Have a nice Sunday. Good trip home. Thank you to the organizers. Thank you to Stefan. Thank you for the helpers.