 Hello, everyone, and welcome to the next talk for the Room 3, me. I am delighted to introduce Tariq Barada, who is going to be talking to us about how to use maths to produce the optimal wedding seating as somebody whose wedding is in a month from today and did maths. I'm really excited to hear this talk. Tariq, where are you calling in from today? Good morning. My name is Tariq, yes. Thank you for the introduction. I'm calling from Vienna, Austria. And I'm very happy to have the opportunity of giving this talk at the EuroPyton 2021. Well, we're delighted to have you. Without any further ado, I'll let you take it away. Thank you very much. So hello, I'm Tariq and I'm a data scientist at the Treyport, specialised on forecast and optimisation models for energy markets. But today, I'm going to take you to a wedding. And hopefully, well, as you know, we're like in the eye of the storm, right between two probable lockdowns. But right now, parties are a load. And for this reason, we can marry. And more specifically, we look at a seating arrangement problem. And you probably know if you've already been at a wedding that to preserve the piece of the wedding, you're going to see the different guests in a way that minimises conflicts. And we'll see how to achieve this in Python using the wonderful PiOMO optimisation and modelling package. It's an open source package. And there is a notebook to this talk, which is available on GitLab. There is a link here on the schedule and probably on the chat. You don't have to download it now, but we'll have a look later and you can look it up by yourself. So you know this kind of situation, right? People invited to a wedding. You've got these brothers-in-law, sister-in-law that can't really go well along with each other. You've got this weird uncle that keeps telling embarrassing jokes and that you have to somehow neutralise. Here we're invited at a wedding with 16 guests. You might argue 16 guests is not a lot, but on the other hand, you probably won't be allowed to invite much more in these times. And you want to allocate them onto four tables with four seats each. Each guest is characterised by a corona index. It's a number between minus five and plus five, which measures the opinion of a guest with respect to the corona pandemics. Ranging from, well, minus five would be like the absolute conspirator who is deeply convinced that corona is just an invention by big pharma and big Bill Gates. And on the opposite of the spectrum, you've got the total science freaks, positivists who can't wait to get their third shot of this Pfizer-BioNTech. And you'd like to, let's say, have rather homogeneous tables where people with relatively, well, compatible opinions sit together. The other property that you have that each guest is characterised by his or her gender for simplicity will keep it binary and say that females are one and males are zero. And the question is, how can we arrange the guests so that people with relatively close opinion with regard to corona sit at the same table? So for that, we compute a corona distance for each, well, for each pair of guests, G1 and G2, we just take the difference of the absolute value of the difference of the scores. And we can display it on this heat map. So it's a kind of representation, which you've probably already seen from actual maps when you have the distance between two cities. The darker the cell, the higher the distance. And what you want to avoid is a situation where Sandrine Flipe and Lucius Motif, for example, would keep arguing on corona the whole evening. So we define the distance for a given table as the sum over all pairs of guests of the corona distance. And the factor of one half here is simply to avoid counting the same pair of guests twice. So how could we solve this? Well, there is obviously a simple solution, which is to sort the guests by their corona index and fill the tables accordingly. If you do this, well, you'll see that indeed it works. But does it still work if we add new constraints? And which are these new constraints? Well, first, we'd like to achieve some kind of gender mix, gender balance. So we want to have at least one female and one male guest per table. And most importantly, we've invited the Krasinski. They used to be a couple long time ago, but they're diverse and they hate each other, quite frankly. It's not an option to not invite any one of them because otherwise the other one would be terribly offended. But believe me, you don't want them to sit at the same table because otherwise the wedding is going to be a nightmare. So we see already that by simple sorting, it doesn't work anymore. So what could we do about it? A solution we'd always work is brute force iteration. So we simply go through all the possible ways of arranging, of seating the people and see whether it works or not. Well, how many combinations are there by the power of factorials? If you want to seat 16 people on 16 chairs, you've got roughly 21,000 billion ways of doing this. That's a big number. However, if we say that the order of the guests on the given table does not matter, we've got like only 60, 63 million distinct table arrangements. So one solution would be to go through each of these 63 million table arrangements. Check if it is a valid one in the sense that we have enough male and female per table and that the Krasinski don't sit at the same table, please not. And if it's a valid arrangement, measure the corona distance of this arrangement and compare it to the best one we found so far. And if it's better, then we would save this solution as the best one so far. If we iterate through all the solutions, then eventually we'll find the best solution. So in the notebook you can check. I'm making an estimate. I'm estimating the time it would take. So if you really go through the 21,000 billion combinations, redundant arrangements, these would take years. However, if we already find a way to determine this distinct arrangement, then it would roughly last on my laptop half an hour. So it works, but it's quite time consuming. And for this reason, there is actually another way of dealing this kind of problem. And it's by recognizing them as a so-called optimization model under constraints. What we want to do to rephrase it is to minimize the total corona distance on each table. This is something that we can call our objective function, subject to a certain number of constraints. And these constraints are that each guest sits at one and only one table. There are no more guests than seats on each table. There is at least one male guest at each table, at least one female guest as well. And Julius and Samantha Krasinski do not sit at the same table no way. These are the ingredients of an optimization model. Typically, such a model consists of variables. So these are the things you do not know in advance and that you would like to optimize to find the optimal value of the decision variables. For example, for a given guest and a given table, whether the guest sits at the table, yes or no. So it would be a binary decision variable. You've got parameters which are given numbers that specify your problem. For example, the number of seats per table. They're known in advance, right? You've got constraints which are equations or equations which link variables and parameters together. You've got an objective function. Here it's our corona distance that we want to minimize, but you could also maximize something. And eventually there is something which is not strictly speaking necessary to an optimization model, but it's very convenient to have sets of indices to index the other modal components on. We'll see an example in a second. So it's common to express these models in a pretty standardized way and to pass them to a so-called optimization solver, which is an external program which is capable of ingesting these standardized forms. So it's basically big matrices. Solve, find the optimum using some clever algorithms and return you the result. So here we're using to model a problem. You will be using Pyomo. So it's the Python. Well, Pyomo probably stands for Python optimization and modeling package. It's a wonderful package full of tools to solve to express optimization problems. So essentially you write down equations in math and they're turned into code to send the problem to a solver. And when it's solved to collect the result and bring them back to Python. And as a solver, we're using the CBC coin or a solver. So BC stands for branch and cat. This is the algorithm that's used for this type of optimization problems. And both Pyomo and CBC, the CBC solver are currently hosted or taken care of by the coin or our computational infrastructure for operations research foundation. So it's all open source. And operation research is the branch of mathematics that deals with this kind of problems. So let me walk you through how such a Pyomo model looks like. So here I'll show you an exemplary, very simple model. It's not complete. The syntax is I've simplified the syntax. So what you see on the screen wouldn't compile wouldn't run. However, if you look in the notebook, you'll see a running example. So you first define an empty model, a Pyomo model, and then you can add ingredients by ingredients. You can add the sets. For example, it's very convenient to have tables, a set of tables, simply the name of the table so you can, you can, you have a handle to them. Then you can add parameters which may or may not be indexed on a set. For example, the table capacity, the number of chairs per table is a parameter which is indexed on the set of tables. Avocado table has four seats, etc. You can add the variables, the one that you're going to look for. And for example, one variable we'd be very interested in is this binary, this set of binary variables called guest seats at. It's indexed on the Cartesian product of guests and tables. And it tells you it's binary one or zero and it tells you whether a given guest sits at a given table, yes or no, right. And this is what you are going to need. Well, this defines a seating arrangement, a combination of these binary variables. You can add constraints, which again may or may not be indexed on the sets you've defined previously. An obvious constraint is we want one table per guest, so each guest is assigned to one and only table. And you see that such a constraint object, well, first it's indexed on a, on a, sorry, on a, on a set. And then it contains a rule. This rule is an equation, a mathematical equation written in Python. This one tells us that if you sum the binary variables guest seats at for a given guest, but if you sum it over all the table, then this must be equal to one. It's like a one-hot encoding. For a given guest, this variable will be zero for all tables, excepted one. And this way we ensure that the guest is only assigned to one table. We add eventually the last bit of the model. The model is the objective function. The objective function is again a function, an equation. Well, it's actually a relationship, an equation, which can be outsourced, sort of defined in a function that would return such kind of equation. So you can use all the usual Python object, which measures the total corona distance. So the sum of the distances of the corona distances on each table. And you have to supply the direction of the optimization. So here we want to minimize the disagreement between guests and ensure a peaceful wedding. Your model is ready to be solved. You pass it to a solver. It's a handle to an actual solver program. And when it's solved by the solver, you can retrieve the values, the optimal value, meaning the values of the variables that minimize the objective function. And you can check, for example, that Julius Krasinski is sitting on the coconut table. But that Samantha Krasinski isn't, which is a great source of relief. So now we're going to switch and go to the notebook. So again, I said this notebook is zoom a bit. Hope you can see, hope you can see well. So as I said, this notebook is on the, should be on the, or even the chat, and you can also execute it if you want. So I'm going to quickly go through the beginning. So it's all things we've already discussed. But in the notebook, it's explained in many more detail. And come to the part where we expressed, we write down the model using Pyomo. So I won't go into the details of all the constraints, obviously. But here, at least you see like the real syntax of Pyomo. So we declare a model. We declare our sets, a set of guests and a set of tables. We declare our parameters, some being scalar, for example, the minimum number of females per table. It's a scalar number. But we can also define parameters which are indexed on a set of tables, for example. The corona distance is an interesting set because it's indexed on the cross product of guests with themselves, right? Because the distance is defined for each pair of guests with some trivial results, like the fact that the distance between yourself is usually zero, because you tend to agree with yourself. But you need to declare all these sets. And we declare the variables, again, telling what set they are indexed on and which domain they belong to, whether it's binary, it could be in some of the problem's continuous variables. And here we've got two sets of variables. We've already seen the guest seats at set of variable, telling us whether a given guest sits on a given table. But we also need a second set of variables called two guest seats at, which tell us whether a given pair of guests sit at a table. I'm going to need this because of the equations which compute the distance, because the distance is always defined within two guests, right? And then we'll add the constraints. We've already seen the constraint that we have one table associated to each guest. We have a constraint telling us that the number of guests sitting at a table might not exceed the number of chairs. We've got constraints to tell us that there should be at least a certain number, at least one female per table and one male. And then there is a slightly more complicated constraint, which is here to define this variable two guest seats at G1 and G2. Sorry, two guest seats at a given table. So G1 and G2, two guest seats at table T. Obviously, we would like to write down that this is the product of the binary variables telling us whether a given guest sits at a table. However, this constraint is not a so-called linear constraint. It's a product of two variables. And for reasons I don't want to go right now into, we prefer to keep it linear so that the problem remains a so-called mixed integer linear problem, which are known to be easier to solve. There is a trick for that. And eventually we write down the Krasinski exclusion principle, which makes sure that Samantha Krasinski and Julius Krasinski do not sit at the same table. Eventually, we define our objective function. As the total corona distance, well, we define it in two steps. First we define for each table the total corona distance on the table, which is obtained by summing the distance of all pairs of guests using these variables to tell us whether the two guests sits at the table or not. The sum will only contain terms when the two guests sit at the table, and we sum this over all the table. And now the stage is ready. We can pass the model to a solver and solve it. You already see that it takes some time. And by the way, when you solve a model, you can also pass as an argument the certain time limit, which is a time up telling you, well, after let's say two minutes, 120 seconds, just give me the best solution you've found so far. And we see here that after 17 seconds, the solver declares that it has found an optimal solution. And the objective function, so the total corona distance for this solution, presumably optimal solution, is 29.18. We can now extract the result, the optimized values from the model. And in particular, we are interested in how people seats. So let's look at this. So this is the actual result. So we see that each guest is allocated to one given table. So here I've ordered them by table that you can check that there is for each table at least one male and one female. And you can check that Samantha Krasinski is on the coconut table while you use Krasinski it on the avocado table. So actually by, well, you can try to convince yourself or to get some feeling about whether this solution is optimal or not by, for example, looking at the average corona index per table where you see that it fairly kind of evenly samples the space minus 5 plus 5. And if you look at the standard deviations of the dispersion of the corona index here on each table, you feel you see that it's rather narrow so that the model tries to achieve a relatively narrow distribution of corona index per table. And of course having to do a trade-off between minimizing the corona distance but still, for example, having the Krasinski is not sitting at the same table although they have actually very similar opinions on this matter. But actually how can you be sure that the solution is optimal? Well, it's pretty easy when you have a solution to check that it is valid. You can check the seating arrangements. You can check easily that there is one male, one female per table and that the Krasinski do not sit at the same table. But how can you check that it's optimal? Either you can prove it mathematically but you can prove that the algorithm which is implemented by the solvers so this so-called branch and cut returns an optimal solution. And apart from that, well, it would be the brute force way. What you can do, it's not a proof, of course, but what you can do is to collect some, let's say, anecdotal evidence. Well, you can try to generate that random seating arrangements, retain those who are valid and compute the objective function, the corona distance and check that indeed all the solutions you find have a worse, larger corona distance than the optimum. So first, let's just check that the optimal arrangement is a valid solution. We do kind of an independent check outside of the model and it is. It has a distance of 29.2 and let's look at 10 random seating arrangements which are valid and print the total corona distance and fortunately all of them are worse and very often much worse than the optimal arrangement. Of course, it's not a proof but it suggests that it's quite hard to beat the optimum and actually it's not possible to beat the optimum by definition. So coming back to the slides, I am glad to tell you that we've reached the happy end of this talk. So we have found an optimal seating arrangement after roughly 10, I think this started with 17 seconds on a laptop whereas by brute force iterating through all the solutions we would have taken roughly half an hour or 20 minutes. So it's faster. In practice, very often in real life what happens is that you have a limited budget and time. Your boss pops up into your office and says, well, I want the best solution within two hours, 10 minutes. So what you can do is to interrupt the solving after a certain time up and keep the best solution so far which very often happens to be good enough. But the cool thing about these optimization methods is that they usually give you a better solution because you could also brute force iterate and then solve after a given time. This would also give you a best solution so far but usually these algorithms would come closer to the actual optimum or the actual optima because there may be a difference. There may be a lot of degenerate solutions, so solutions which are equally optimal. The cool thing about PiOMO is that PiOMO is a modeling language. It translates math into code. There are other modeling languages around, but this one is written in Python and it leverages on the power of Python as a programming language. You can get data from Python into PiOMO, send to the solver, get the result back, process it in PiOMO, so it's really integrated. It's not the only one. As an alternative, I would cite PULP which is a very good alternative for, let's say, simple optimization problems, in particular so-called mixed integer linear problems such as the one we solved today. For what we've seen today, PULP would have been more than enough. However, I think that PiOMO has a much wider range of problems it can solve, including, and here I'm just dropping names, including nonlinear optimization and stochastic optimization. And eventually, what I'd like to give you as a take-home message is that many real-life situations, particularly in the industry, can be expressed as optimization models in logistics, in economics, finance. Here at Trayport, we're using these kind of techniques to optimize energy systems and energy trading. Eventually, it can be used to optimize the arrangement of the guests at your wedding. It's probably not the only issue you have to tackle at a wedding, but at least this one is solved and you can fairly hope that you'll have a peaceful wedding. With this, I thank you very much for your attention and I'd be very glad to answer your questions, whether now, if we have some time, or at our virtual booth, because Trayport is a sponsor of this conference and we'd be happy, my colleagues and me, we'd be happy to answer your questions and tell you more about Trayport. Thank you very much. Fantastic. I really, really enjoyed that. Thank you so much. I'll have to adopt some of that methodology for planning my seating plan. We do have a question on the chat from Fabian. Fabian asks, have you experimented with how many guests the model solver can deal with, or how those unlinked constraints of, for example, divorced guys influence running times if there were more of those? Thank you for the question. I didn't experiment systematically, but what we could say is that if you look at the brute force way or simply if you look at the number of combinations they are, it scales very badly. I think like even worse than exponential. So if you're looking at, let's say, the number of arrangements, assuming that you have a certain number of guests per table and which is constant, but you increase the number of tables, it will, well, if I didn't do anything wrong with the math, it's like M to the power, the number of tables to the power of the number of tables, so probably something like exponential. So brute force iteration scales really badly. How about the branch and kettle algorithm? And, well, I couldn't really give you a definitive answer. As far as I understand, there can always be pathological cases where it scales also really badly, but in practice, it's much faster in a lot of real cases, but still for this kind of problems, branch and kettle, even increasing, well, if you double the number of table, for example, you've got 10 to the 16 times more arrangement, so even in this case, branch and kettle wouldn't help you that much, to be honest. Thank you, very comprehensive answer. Ah, we have another question. Can pyome be used with a different solver than coin or CBC? Is that one you recommend? Okay, so yes, pyome can be used with any, well, all the standard solvers, there are commercial solvers, which are like Grobi or Cplex, which is very powerful, but commercial. CBC is a very good open-source, the best open-source solver I know, I'm aware of, so I would definitely recommend using this one unless you're limited. Otherwise, if you have a big problem that you can't solve on your computer, what you can do is to use remote infrastructure. You can use those program hosted by, I think universities across the world. It's also compatible with pyome, we just need the internet connection. It sends your problem through the internet and you get back the solution. So it's for like one-shot experiments, it's good. You wouldn't use it for your startup. Excellent. We have another question. It looks like you really need to know the underlying data structures for defining constraints, with the need to create supporting intermediate data structures. Do you know about a collection of common constraining patterns to make this easier? I'm not sure. I get the question. In my understanding, you need to really well define, yes, you need to know everything. You really set right down the equation, so you need to have a pretty clear idea of the data you're working with. Of course, like in which kind of Python data structure the data is, doesn't really matter. You just put it in the end in more or less dictionaries. I guess this is not really what you're asking for. My question is how far do you need to understand the maths behind it in order to be able to use Pyomo effectively? My guess would be you do need to know some of the theory fairly solidly. That's true. You need to write down the equations by yourself. Hopefully, you can also get some inspiration from existing models on the, let's say, your gallery on GitLab or books, et cetera. All right. I think that's all of the questions we had from the chat, and we are at time. Thank you so much, Tariq. Thank you very much. Happy wedding.