 Thanks so much for coming out to hear me talk about my favorite topic probabilistic programming. I Thought I'd introduce myself real quick. I'm recently relocated back to Germany after studying at Brown where I did my PhD on Bayesian modeling and we studied decision-making and For a couple of years. I've also been working with quantopian, which is a Boston based startup and as a quantitative research and there we are building the world's first with McTrading platform in the web browser The talk will be tangentially related So I'm gonna say I'm gonna just kind of show you that screenshot of what it looks like So this is essentially what you see when you go on the website. It's a web-based IDE where you can code Python and code up your trading strategy And then we provide historical data so that you can test how you are and would have done If it was I don't know 2012 and then on the right here You see how well it did and often and that's what I'll refer back to where you are interested in whether you beat the market or Lose against the market and also I should add it's completely free and everyone can use it Okay, so I think at every talk that should be the main question as well Why should you care about probabilistic programming? All right? and it's not really an Easy talk just because talking about probabilistic programming you need to have at least a basic understanding of Some concepts of probability theory and statistics. So the first 20 minutes I will just give a very quick primer focusing on an intuitive level of understanding Can I get a quick show of hands like who sort of understands on a intuitive level how base formula works? Okay, so most of you so Maybe you won't even need that primer but Might still be interesting and towards the end then we have a simple example and then a more advanced example that should be interested even if you know Already quite a bit about business statistics So to motivate this further I really like this Contrast that Olivier gave it is Talk about machine learning and that is chances are you are a data scientist maybe and use scikit-learn to train your Machine learning classifiers. So what this looks like is on the left you have data then the That you used to train your algorithm and then you that algorithm makes predictions And if those predictions are all you care about then that that might be fine, right? But one central problem that most of these algorithms have is that it they're very bad at conveying what they have learned So it's very difficult to inquire what goes on in this black box right here So on the other hand probabilistic programming is inherently open box and I think the best way to think about this is That it is Statistical toolbox for you to create these very rich models that are really tailored to the specific data that you're working with and Then you can inquire that model and really see what's going on And what was learned so that you can learn something about your data rather than just making predictions, right? And the other big benefit I think and we'll see that later is that these Type of models work with So for black box inference engine which are sampling algorithms that work across a Huge variety of models So you don't really have to worry about the inference step all you have to do is basically build that model and then hit the inference button and in most cases You'll just get the answers that you're looking for So there's not really much in terms of solving equations, which is always nice So throughout this talk. I want to use a very simple Example that most of you will be familiar with and that is a B testing as you know when you have Two websites and you want to know which one works better in some measure that you're interested in maybe the conversion rate or How many users click on an ad? What do you do to test that so you split your users into two groups and give group one Website a and give group to website B and then you want to look which Had the higher measure, right? That problem is of course much more general and since I'm coming more from a finance background I'm gonna sort of switch back and forth between the statistically speaking identical problem where you have two trading algorithms and You want to know which one has a higher chance of beating the market on each day? So Here I'm just gonna generate some data to basically see what the trivial answers trivial answers that you might come up with yield and how we can approve upon that and You might be surprised that I'm not using real data But I think that is actually a critical step is before you apply your model on real data You should always use simulated data We really know what's going on and the parameters that you want to recover So that you know that your model works correctly and only then you can be sure that you'll get correct answers by applying it to real data, right? so the data the data that we're gonna work with it will be binary so just Boolean events and That type of Statistics statistical process called a Bernoulli And that is essentially just a coin flip right the probabilities of coin flips And I can use that from sci-pi stats and then I call it Bernoulli and here I pass in the probability of that coin of coming up heads or that algorithm of beating the market on a particular day or that website the converting user and Here I'm sampling 10 trials. So this will be the result, right? Just a bunch of binary zeroes and ones so I'm generating two our rhythms One with 50% one was 60% So you want to know which one is better the easiest thing that you might want it that you might come up with is just well let's take the mean right and actually statistically speaking, there's not a terrible idea and it's called the maximum likelihood estimate and If you ask an applied mathematician What you should do then that might be the answer and I took a cause and applied math And the proofs always work in a very similar way you basically have this problem and then you say well, okay let's have our data go to infinity and then you solve and then you get The estimator works correctly in that case and that's great But what do you do if you don't have an infinite amount of data, right? And that's the much more likely case that you be in and That I think is where basins districts really work well So what happens in our case now where I just take the mean of the data just generated as you can see in this case We get we estimate that the chance of this algorithm beating the market is 10% and 40% for the other one so obviously that's completely wrong I used 50 and 60% to generate it and The obvious answer of why this goes wrong is just I was unlucky and the Observen members in the audience will have noticed that I used a particular random seed here So I found that I just I took that random seed to produce this very weird sequence of events That basically produce this pattern but certainly that can happen with we later right you can just be unlucky and the first 10 Visitors of your website. I just just don't click and The central thing that I think is missing here is The uncertainty in that estimate right 10% 40% that's just a number, but we're missing how confident we are in that number So for the remainder of the talk that will be recurring topic is really trying to quantify that uncertainty Then you might say well There is this huge branch of statistic for frequent statistics which designed these statistical tests to Decide which one of those two is better or whether there is a significant difference Then you might run a t-test and that returns a probability value that indicates How likely are you to observe that data if it was generated by chance? and That's certainly the correct thing to do, but one of the center problems with frequent statistics is that it's incredibly easy to misuse it for example You might run on the you might collect some data and the test doesn't turn out anything and then on the next day You get more data. So what do you do? Well, you just run another test with With all the data you have now right you have more data. So the test should be more accurate Unfortunately, that's not the case and You can see that here just create a very simple Example where do that procedure I generate 50 random binaries with 50% probability both so there is no difference between them and then I start with just two events I run a t-test if that is not significant, then I do three around the t-test, right? So just that process of continuously adding data and testing whether there's a difference and if there's a difference of Smaller than 0.05 then I return true if it isn't then I return false and then I return that a thousand times And I look at what the probability is that even though there is no statistical There is no difference at all between those two they're both point five. What is the chance of this test? yielding an answer that it's that there is a significant difference and It's thirty six point six percent in that case, which is also absurdly high, right? so this procedure really fails if you use it in that way and Granted I I misused the test Right, it's not designed to work in that specific scenario, but it's extremely common that people do that and For me one of the central problems is that frequent of statistics really Dependent on your intentions of collecting the data So if you use a different procedure of collecting the data, for example, let's say what I just did I just add data every day then you need a different statistical test and If you think about this more, it's actually pretty crazy, right? If you just I'd be a data scientist and you just get data from a database You have no idea what the intentions were of gathering that data, right? So and you want to be very free in exploring the data set and running all kinds of Statistical tests to see what's going on. So I think while frequent of statistics is certainly not wrong. It's often very Constricting and what it allows you to do when if you don't do things correctly. You might shoot yourself in the foot And I think that's really a good setup for basing statistics and I'm just going to introduce that very quickly. So at the core we have base formula and If you don't know what that is essentially, it's just a formula that tells us how to update our beliefs when we observe data That implies that we have prior beliefs about the world that we have to formalize and then here we apply Then we see data and we apply for base formula to update all beliefs in light of that new data to give us our posterior and in general these beliefs are represented as random variables and I'm also going to very quickly talk about What those are and what intuitive ways of thinking about those? so decisions like to call their parameters their random variables theta, so that's what I'm going to use here and Let's define a prior for our random variable theta and theta will be the random variable about the Algorithm beating the market or a single algorithm beating the market or the website converting a user, right? So what is the chance that that happens? oops So I didn't want to show that I just wanted to show that So the best way to think about that random variable is as opposed to a Variable that you might know from Python programming which just has a single value say i equals five is here We don't know the value right? We want to reason about that value. We have some idea some rough idea about that value so rather than just having one we have We allow for multiple values and assign each possible value a probability and this is what that quote shows here So on the x-axis we have the possible states that the system can be in For example, the algorithm can have a chance of 50 percent of beating the market And I'm going to assume that that is the most likely case just that's my personal prior belief without having seen anything I'm going to assume that on average 50 percent is probably a good estimate But I wouldn't be terribly surprised to see something with 60 percent, you know, it's less likely 80 percent considerably less likely but still possible 100% that like beats the market on every day that that I think would be next to impossible, right? So I'm going to assign a very low probability that so I think that's a very intuitive way of thinking about that so now let's see what happens if I observe data and for that I created this widget here and Where I can add data when I use the slider and then it will update that probability Distribution down here. And so that will be our posterior, right? Currently, there's no data available. So our posterior will just be our prior So that is just the belief we have without having seen anything and now I'm going to add a single Data point a single success. So we just ran the algorithm for a single day and it beat the market So now as you might have seen that the distribution here shifted a little bit to the right side, right? And that represents our updated belief that it's a little bit more likely now that the algorithm is generating positive returns So now let's reproduce that example from before where we had one success and nine Failures, right? There was algorithm a and there we estimated it has a 10% chance of Beating the market. So and that was that was ridiculous, right with that amount of data No way we could say that and And also with our prior knowledge. No way we would assume that 10% is actually the probability So now I created that and is updating this Probability distribution down here, which is now our updated belief that certainly with nine failures we're going to assume that there is lower chance of success of that algorithm which is represented by this distribution moving to the left and But still note that it's the 10% is still extremely unlikely under this condition, right? And that is the influence of the prior we said 10% is unlikely So that will influence our estimates away from these very low values the other thing to note is that the distribution is still pretty wide so Here now we have our Uncertainly measure in the width of the distribution right the wider it is the less certain I am about that particular value So now I want you to in your head Just imagine what the distribution look like if I move this up to 90 and the success up to 10 Right. So basically now we observing data that is in line with a hypothesis that It has 90% failure probability So as you can see the main thing that happens is it moves to the left but also it gets much narrower and that Represents our increased confidence with having seen more data. We have more confidence in that estimate. That's exactly what we want By the way, how cool is is it that I can use these widgets in a live notebook? Okay, so where's the catch with all of that right? This sounds a little bit too good to be true You just like create that model and you update your beliefs and and you're done, right? Unfortunately, it's not always that easy and one of the main difficulties is that this formula in the middle here Can in the most in most cases cannot be solved in the case that I just showed you it's extremely simple You just apply base formula and you can solve it and then you can compute your posterior analytically, but even with like just tiny bit more complex models you get Multidimensional integrals over infinity that will make your eyes bleed and no sane human would be able to solve So and I think historically that's one of the main reasons why Bayes which has been around since the 16th century Has not been used up until recently now where it's kind of having a renaissance. It's just people went able to to solve for it and the central idea of probabilistic programming is that well if you can't solve something then we approximate it and Luckily for us, there's this class of algorithms that are most commonly used called Markov chain Monte Carlo and Instead of computing the posterior analytically that curve that we've seen it draws samples from it. That's about the next best thing you can do So just due to time constraints, I won't go into the details of MCMC So we'll just gonna assume that it's pure black magic and it works And it sort of is it's kind of it's a very simple algorithm But the fact that it works in such general cases is still mind-blowing to me And The big benefit is that yeah, it can be applied very widely so often you just define your model you say go and then it'll give you your posterior So what does MCMC sampling look like as we've seen before this is the posterior that we want right this neat closed-form solution Which we can't get in reality So instead we're gonna draw samples from their distribution and if we have enough samples we can do a histogram And then it'll start resembling What it is? Okay, so let's get to PIMC 3 PIMC 3 is a probabilistic program framework written in Python and for Python and It allows for construction of probabilistic models using intuitive syntax and one of the reasons For doing PIMC 3 rather than to maybe some of us use PIMC 2 PIMC 3 is actually a complete rewrite. It uses no code from PIMC 2. There were a couple of reasons one is just Technological debt that the code base of PIMC 2 is pretty complex It requires you to compile Fortran code which always causes huge headaches for users to to get working So PIMC 3 is actually very simple code and one of the reasons is that we're using theano for all things for the whole compute engine so Basically, we're just computer. They were just creating that compute graph and then shifting everything off to theano And the other benefit we get from theano is that it compiles that it can give us the gradient information of the model and there's this new class of algorithms called Hamiltonian Monte Carlo that work that are Advanced samplers and those work even better in very complex models So they're much more powerful, but they require that extra step and that's not easy to get luckily for us theano provides that out of the box So we don't really have to do Don't have to do anything The other point I want to stress is that a PIMC 3 is Very extensible and also it allows you to interact with the model much more freely So maybe you have used Jags or windbugs or or Stan Which is a other very interesting recent probabilistic programming framework and While those are really cool one problem I personally have with them is that they require you to write your probabilistic program in this specific language and Then you compile that you have some wrapper code to get the data into Stan and then you have some wrapper code to get it out of Stan the results and For me, there's always very cumbersome so you can't really see what's going on in the model You can't debug it. So PIMC 3 you can Write your model in Python code and then really interact with it freely So you never have to leave essentially Python and that for me is Is very very powerful and so you can think of it much more as a library and we'll see that in a second Just the authors. So John Salvatier is the main guy who came up with it and Chris von Speck Also program Quite a bit currently we're still in alpha. It still works It works fairly well already. The main reason why it's alpha is just mainly that we're missing good documentation and We're currently writing those but if you are up for it and would like to help out with that that is certainly more than appreciate Okay, so let's look at that model from our early example that we wanted to and see how we can solve it now in PIMC 3 and for that I'm just gonna write down that model how you would write it in statistical terms So we have these two random variables, right that we want to reason about theta a and theta b and that will represent the chance of the algorithm beating the market and Here we say this tilt means it's distributed as so we're not working with numbers but with distributions so this is a Beta distribution and that is the distribution that we have been looking at at the beginning right just from zero to one if you're working with probabilities the beta distribution is the one to use and So this is the thing that we want to reason that we want to learn about Given data and then we how do we learn about it? Well, we observe data and The data that assimilated was binary so that came from a Bernoulli distribution so we have to assume that it's That the data is distributed according to Bernoulli distributions or zeros and ones and the Probability of that Bernoulli distribution before just fixed point five right here now We actually want to infer that so since we don't know that value We replace it with a random variable and that is the random variable theta a that we had above here so that is how commonly these these models look like and the other point I want to make here is that Here you really see how you're basically creating a generative model, right? So you You might wonder like how can I construct my own model and I think a good Path for that is to just think of how the data would have been generated right here I know well There's this probability and a generated Bernoulli data So that's the model I'm going to create but you can get arbitrarily complex and then say well I have all these hidden causes that somehow relate in Complex ways to the data and then you're gonna invert that model using base formula to infer these hidden causes so here I'm just gonna Again generate data a little bit more now so again 50 and 60 percent probably of beating the market or conversion rate and 300 values and This is what the model looks like in Pi MC 3 so first we just import pi MC SPM and we instantiate the model object which will have all the random variables and whatnot and The other improvement over pi MC 2 is that everything you Specify about your model you do under this with context and basically what that does is that it? Everything you do underneath here will be included in that model object so that you don't have to pass it in all the time so underneath here now This should look pretty familiar from before where I just had these random variables right theta a distributed as a beta distribution So here I now write the same but in Python code where I say well theta a is a beta distribution we give it a name and We give it the two parameters and alpha and beta are the two parameters that this distribution takes the number of successes and failures So this is the the prior that I showed be before that was centered around 50% And I do the same thing for theta b and now I'm gonna relate those random variables to my data And as I said before that's a Bernoulli Which I'm gonna instantiate every other name and instead of the Fixed p-value now. I give it the random variable right that we want to link together and Since this is an observed node We give it that array of 300 binary numbers that are generated a slide before right So this links it to the data and links it up to the random variable And the same for be so Up until here nothing happened We just basically plug together a few probability distributions that make up how I think my data is structured now It's often a good idea to start the sampler from a good position and For that we're gonna just optimize the log probability of the model using find map for find the maximum upper stereo value and Then I'm gonna instantiate the sampler. I want to use their various you can choose from him using a slice sampler, which is Which works quite well for these simple models and now I actually want to draw the samples from the posterior, right? and for that I call the sample function and I Tell it how many samples I want 10,000. I provided with the step method and I give it the starting value and when I do this call it'll take a couple of seconds to run the sampling algorithm and Then it really would return the structure to which I call trace here and that is essentially a dictionary For each random variable that I have assigned. I will get the samples that were drawn and now that I ran that I can Inquire about my posterior, right? So here I'm using seaborne, which just as an aside is an awesome plotting library on top of metplotlib You should definitely check it out creates very nice statistical plots. For example, it has that nice this plot Function that basically just creates a histogram, but one that looks much nicer and has for example this nice shaped line and I give it the samples that I drew that my MCMC sampling algorithm drew of Theta a and theta b and then it will plot the posterior now that I created and that is Again that combination of my prior belief updated by the data that I've seen and now I can reason about that and the first thing to see is well, the theta b the probability of all the chance of that algorithm beating the market is 60% and that's what I used to generate the data so that's good that we get that back and again, that's why we use simulated data to know actually that we're doing the right thing and The other one is around 50% or 49% The other thing To note is that here now instead of just having that single number that seemingly fell from the sky that we would get if we just take The mean we have our confidence estimate, right? We know how white that distribution is we can answer many questions about that like how likely is it that the Probable that the chance of success for that algorithm is 65% and then we we get a specific number out that that represents our level of certainty on and We can do other interesting things like hypothesis testing to answer our initial question, which of the two actually does better And for this we can just compare the number of samples That were drawn for theta a to the samples of theta b So we're just going to ask well, how many of those are larger than the other one And that will tell us well with the probability of 99.11% algorithm B is better than a and that is exactly what we want, right? So by Consistently having our confidence estimate carried through from the beginning to the end gives us that benefit of Everything we say has that confidence and probability estimate associated with it Okay, so that was super boring up until now. Hopefully it gets a little bit more interesting now So consider the case where instead of just two are going to be might have 20 and that is More what we have on contorpion many users have these algorithms and Maybe we want to know not only each individual algorithms the chance of success, but also the algorithms overall the the group average Are they also doing? They're also consistently beating the market or not so The easiest model you can probably build is just the one we did before but instead of two theta a and b we have 20 Thetas right and while that's fair and This is called an unpooled model. It's somehow unsatisfying right because We probably assume that That these are not completely separate right if the the algorithms work in the same market environment Some of them will have similar Properties some similar algorithms that they're using So there will be related somehow right there will be there will have differences But they will also have similarities and this model does not incorporate that right There's no way of what I learned about theta one. I would apply to theta two The other extreme alternative would be to have a fully pooled model Where instead of assuming each one has its own random variable. I just assume one random variable for all of them and that's also unsatisfying because We know that there is that structure in our data and Which we're not exploiting and also even though we might get group estimates We could not say anything about a particular algorithm. How well that one did right? so the Solution which I think is really elegant is called a partially pooled or hierarchical model and for that We add another layer on top of the individual random variables, right up until here. We only have The model we had before with all these independently But what we can do is instead of placing a fixed prior on that we can actually learn that prior for each of them and have a group distribution that will apply to all of them and Those models are really powerful and a very many nice properties. One of them is well what I learned about theta one from the data Will shape my group distribution and that in turn will shape the estimate of theta two So everything I learned about the individuals I learned about the group and what I learned about the group I can apply to constrain the individuals and Another example where this where we do this quite frequently is from my research on say psychology where we have a behavioral task that we will be test 20 subjects on and often we don't have enough time to collect lots of data So each subject by itself the estimates we would get If we fit a model to to that guy It will be will be very noisy and that is a way to build a hierarchical model to basically Learn from the group and apply that back to the group. So we will get much more accurate estimates for each individual That's very a very nice property of these hierarchical models so here I'm just going to generate again some data and the Essentially the data will be just an array 20 times 300 20 subjects 300 trials and will just be Each row is the binaries of each individual right and then for convenience I also create this this indexing mask that I will use in a second that might not make sense right now and But just keep at the back of your mind that Basically, I'm indexing the first row will be just an index for the first subject and Indexing into that random variable, but this is the data that we're going to work with Okay, so how does that model look like in pym C3? so Here I'm gonna first create my group variables the group mean and group scale. So how what's the what's the average? rate the average chance of beating the market of all algorithms and how variant variable are there that's gonna be the the scale parameter and this is a Choice you make in modeling which price you want to use here. I used a gamma distribution and for the variants I use Sorry, I use a beta distribution for the group mean and I use a gamma distribution because variants can only be positive With with certain parameters, but the details of that are not that critical then Unfortunately, the beta distribution is parameters in terms of an alpha and a beta parameter and not in terms of a mean and variance Fortunately, there's this very simple transformation we can do to these mean and variance parameters to convert them to alpha and beta values that I'm doing here and while the Specifics of that are not important. I just wanted to show how easy it is and if you use some other Languages, that's not a given that you can just really very freely combine these random variables and transform them and and still have it Work out and the reason is that these are just these theano expression graphs that once I multiply them it will actually take the Probability distributions or the formula and combine that and actually do the math in the background of combining that So then I need to hook that up with the with the Theta's with my random variables for each algorithm and instead of having a for loop and just generating 20 of them I Can pass in the shape argument and that will generate a vector of random of 20 random variables that will be theta, so this is not a single one, but actually 21 and Before we will note that I had just my hard-coded prior of five and five here right in the previous model But now I'm replacing that with the the group estimates that I that are also gonna learn about and now again, my data is going to be Bernoulli distributed and for the probability now I'm gonna use that index that I showed you before and essentially that will index into this vector In a way so that it will turn that into a two-dimensional array of the same shape as my data And then it matches it one-to-one and it just does the right thing And then I pass in the 2d array of the roads of binary variables for each algorithm and Again, I'm running. I'm finding Google starting point and note here that I'm using now this called nuts sampler, which is This state-of-the-art Sampler that uses the gradient information and works much better in complex models specifically these hierarchical models often very difficult to estimate But the this type of sampler does a much better job and there was one of the reasons actually to to develop I'm C3 Oops and then with a trace plot command we can just create plots so Don't mind what the right side but here now we get our estimates of the group mean and again we have Not a single value, but rather the confidence. So on average. We think it's about 46 percent We have the scale parameter and we have 20 individual Algorithms, right? So that would be theta one two theta 20 and All of them constraining each other in that model So that's pretty cool So to rev up how I convince you that probabilistic programming is pretty cool And that it allows you to tell a generative story about your data, right? And if you listen to any tutorial on how to be good data-signed that it is Telling stories about your data, right? So how but how can you tell stories if you all you have is that black box inference algorithm? So I think that's where probabilistic programming is Really quite improvement You don't have to worry about inference these black box algorithms Work pretty well. You have to know how they what it looks like if they fail And and it can be tricky then to get it going. So it's not It's not super trivial, but still they they often work out of the box and Lastly pym C3 gives you these advanced samplers I'm gonna skip that and Go to further readings. So check out quantopia and everyone I design algorithms that have hopefully higher chance than 50% of beating the market for Some content on pym C3 actually I have written a couple of blog posts on that and currently that's probably the best resource for getting getting started and Mainly that's just because there is not that much else written about pym C3 in terms of documentation and down here these also some really good resources that I recommend to To learning about that So thanks a lot Yes, please Yeah, so so the question is Stan provides a lot of tools for Assessing Convertings and many diagnostics, but also very nice feature of transforming the variables and placing balance on that and I So pym C3 has Like the most common Statistics that you want to look at like the government Rubin our hat statistic and all of that and you can sample in parallel and then compare and You can and we do have support for transform variables. It's not like as Polished as Stan just because it's still an alpha, but yeah, it's there and you can and you can find your parameters So yeah that That works, but it's not quite a streamlined yet more questions sure Great question. So the question was I Can't use this sampler that we provide here Hamiltonian Monte Carlo because it's too expensive to use so how difficult would it be to use my own sampler and That I think is a big benefit of pym C3 is that you just basically inherit from the sampler base class and then you Overwrite the step method and then you have you can do your own proposals and acceptance and rejections So that's very easy. And if you look at Stan for example, I haven't done it But I imagine that it's quite difficult just when I look at the code. It's It's really hardcore C++ and all the templates make my head hurt The other question if I understood you correct was like if you can't evaluate the likelihood or oh, yeah So the question is how does it compare speed wise I guess or if you write your own sampler and Python won't that be slow? and so I Think most of the time is actually not spanned in the sampler but rather than evaluating the log likelihood of the model and also the gradient computations very difficult and it's true that standards is fast, but one it's fast once it gets started, but it takes quite a while to compile the model actually so in in that sense I I haven't really done the speed comparison and we recently have noticed some areas where PIMC 3 is not fast and We need to fix those and speed it up And certainly the Stan guys have done a lot to really make a China and that's the benefit of having C++ but on the other hand One benefit I think to the honor is that it does all these Simplifications to the compute graph and does like clever caching and you can even run it on the GPU. So I We haven't really explored that to the fullest extent yet, but I think there's lots of potential speed-ups that just the honor could give us and Another answer to your question as well if you for example, you really spend that much time in your sampler Of just proposing drums. You could also use Scython for example and encode your sample in that the question is about parallel sampling and that is Possible so there is just a P sample function instead of the sample function and that will distribute the model it doesn't quite work in every instance yet, but yeah, it uses multiprocessing so you get true parallelization and Just in the side there's this really cool Project that someone on the mailing list just wrote about that is what pym C2 But the same trick could be applied to pym C3 and he's uses spark To basically do the sampling In parallel on on big data like if you have data that doesn't fit on a single machine you can run individual samplers on subsets of the data in parallel and then aggregate them and Spark lets you do that very nicely and he basically hooked up Pym C and spark so that's really really exciting