 Hello, and welcome back to the Sports Biomechanics lecture series, as always supported by the International Society of Biomechanics in Sports and sponsored by Vicon. I'm Stuart McCurley Naylor from the University of Suffolk, and today I'm joined by Tony Myers, who is a professor of quantitative methods at Newman University. And he's also chair of the Sport Exercise and Health Analytics Special Interest Group for the British Association of Sports and Exercise Sciences. And Tony is kindly offered to record a lecture on Bayesian statistics for sports sciences. This is a topic that a few people requested after Kristen Sonani's excellent lecture on statistics a few weeks back now. So I was really happy that Tony agreed to deliver that. The lecture is pre-recorded. So if anybody has any questions as we're going along, either drop them in the comments on YouTube or get in touch with either myself or Tony via Twitter, for example. And one of those ways we'll try and make sure we get an answer to you. Thank you. Thank you, Stuart. My name's Tony Myers, and this presentation is on Bayesian data analysis. The aims of the presentation are to very briefly highlight some of the issues identified in traditional approaches that we've tended to use in sports science, and particularly pointing to sources where you might explore these issues and solutions further. Look at one of the solutions to the issues highlighted and look at the benefits of Bayesian data analysis. Provide an introduction to some of the basic concepts of Bayesian data analysis. Look at perform and interpret some very basic analysis using some very user friendly software, JASP, that's free to download and constantly being updated. And then show an example of Bayesian analysis workflow that involves a little bit more complex analysis and slightly different software, just to highlight some of the things that you might want to explore further. There's been a number of issues identified with traditional statistical approaches. The particular importance was a follow-up to the American Statistical Association statement in 2016 about caution with the use of p-values. And this particular special edition of the American statistician in 2019 went beyond their previous claims, and this was an editorial in that special edition called Moving to World Beyond P is less than 0.05. What did they say? What did they add to their particular in their previous 2016 statement? Well basically what they said was it's time to stop using the term statistically significant entirely, along with variants such as significantly different P is less than 0.05, non-significant or other forms where you might highlight these things by asterisks in a table or in some other way. So what are the alternatives if we take their suggestion seriously? What are the alternatives? Well there are a number highlighted actually in that particular special issue and it's worth looking at those, including the proper use of traditional p-values. So there are a number of credible alternatives and one of them is the focus of this lecture and it's Bayesian data analysis. What are the benefits of Bayesian data analysis? Well one is the use of prior scientific knowledge. This is potentially also a double-edged sword because how do we agree what prior scientific knowledge is, but certainly it's way of incorporating what we already know about something, even if it's, and we do often know about something, even if it's only about the measurement itself, about what the measurement scale is in the problem we're addressing. We produce the Bayesian analysis distributions and uncertainty estimations in a distribution rather than in an interval and this has benefits for its communication, it has benefits for particular calculations of probability, which are very intuitively interpreted. So probability in the intervals generated a sort of like common-sense interpretations of probability. When we talk about a 50-50 chance of reining, we give a chance of reining 50% and that's how we generally talk about in everyday language probability and Bayesian probability allows us to use those type of probability statements. It doesn't rely on hypothetical data for inference, so where we, in a traditional analysis, match our data to look at the compatibility of our data with a hypothetical null distribution, it doesn't do that. Bayesian analysis allows for model comparison, I don't know tradition analysis does, but we've got a number of tools to compare models, to both criticize and explore those models in a very straightforward way. And complex models that are problematic for non-Bayesian analysis methods can be constructed. So these are some of the benefits. Like any other type of analysis or any other type of endeavor, garbage in, garbage out. If your methods poor and your research is ill-conceived, the best statistical analysis will not rescue the study and make your results useful. We use a sample to draw conclusions about a population and we do that either by estimation or hypothesis testing. We then estimate differences, relationships, variation or test a hypothesis comparing either null to a research hypothesis or compare hypotheses. We tend to make inferences about groups or actually individuals potentially. So we sample from a population of either measurements, so it doesn't actually have to be just people. It can be measurements over time of an individual. We're making inferences about that population from the sample and it's important to be clear what population we're attempting to generalize to. Subgroup, geographic representations, etc. Well particularly in Bayesian analysis where we're incorporating current knowledge, using prior knowledge before we collect the data. And that prior knowledge will differ depending on the population we're trying to describe. So basic principle behind Bayesian inferences is to start with prior knowledge and combine that with data in the form of a likelihood and that produces our posterior distribution. So here's Bayesian hypothesis testing different. Well first of all it uses a Bayes factor rather than a p-value. We've got Bayes factor 1-0 which tells us how likely the observed data are to occur under h1 compared to h0. So in favor of the alternative hypothesis or our research hypothesis compared to the null hypothesis. For example how likely there is a difference compared to how likely there is no difference following our intervention. Bayes factor 0-1 is a reverse. It's how likely the observed data are to occur under h0 compared to h1. So how likely is the no effect compared to an effect for example. So Bayes factor 1-0 we've got a scale. One means equally likely either way. Evidence doesn't favor either the null or the alternative hypothesis. As we go from 1 to 3 evidence starts to get stronger but really between 1 and 3 it's still anecdotal. It's really we don't want to make claims for an effect. When we go above 3 we start getting what might call moderate evidence above 10 strong evidence above 30 very strong evidence and above a hundred extreme evidence favoring our research hypothesis or the alternative hypothesis. And Bayes factor 1-0 is the same but in reverse. So where where we've got 1 it's the same thing. There's no difference between evidence favoring the null or the research hypothesis. But as we go to 3 it starts favoring the null a little bit but we can't make any any claims. Then we've got 3 to 10 moderate evidence above 10 strong evidence above 30 very strong and then above 100 extreme. So it's just the reverse of Bayes factor 1-0. Which you choose depends on what research question you're interested in. So let's take a look at a silly example just to look at how we interpret Bayes factor 1-0 in relation to a hypothesis. We're looking at whether the person's pregnant or not and the BF 1-0 suggests that we have extreme evidence for the research hypothesis of pregnancy. You can see the little maroon graphic the pie chart which is something produced by the JAS software and killer illustrate that so all maroon means most complete support for the research hypothesis. Take our next example. Presuming we have no prior information about sex differences in pregnancy in this case we're not sure. We have one which is in between it's a 50-50 chance in the pie chart maroon and whiter evenly distributed half and half so we're really unsure about whether this person's pregnant or not. Final case is the evidence suggests that the person in the picture is not pregnant and we have extreme evidence for that with the Bayes factor 1-0 of 0.003 almost zero. So let's say it works so we evaluate strength of evidence given Bayes factor so it gives us perhaps a little bit more than a P value. So does Bayesian parameter estimation differ from traditional parameter estimation? Well firstly rather than just point estimates we get a distribution a probability distribution of any parameter that we're interested in so mean differences between treatment and control mean difference in pre and post relationship values whatever we interested in whatever parameter we're interested in we get a distribution on that parameter that represents our degree of knowledge and the uncertainty around it. So this is what we get when we've conducted a Bayesian data analysis this is a posterior distribution so it's a Bayesian posterior probability distribution and this particular example is differences in sprint times between a control and treatment group so we've got a range of difference potential differences here from minus 2.5 to probably almost six seconds difference between those two groups and we've got density on the left. So what does this mean? Well the higher the density of y the more probable that x value is so in this case something like 2.45 seconds is likely to be the most probable difference conditional on the data that we had so the most likely probable population estimate we have is likely to be 2.45 seconds in terms of the differences between the control and treatment group. The lower the density of y the less probable that x value is so this is much lower so zero is much less likely because the density of y is much lower than the midpoint of that distribution in which you should look at. With the Bayesian credible interval with this posterior distribution we can say there's a 99% chance the true population difference is contained in this interval this particular case minus 0.09 and 4.56 seconds. Now of course all of these percentages are completely arbitrary values we've got traditional sort of value of 99 and 95 and we use for traditional statistics but that they are purely arbitrary values. It could calculate 95 percent a 95 percent chance the true difference lies in this interval between 0.27 and 4.19 seconds calculated 90% chance and 80% chance and these arbitrary values we depending on what we want to calculate is very straightforward and easy to do. Another property is that we that's very useful is we can get the probability of a difference above any particular value we want so if we think if we've got an idea of measurement error or of a particular above a particular time difference would be really important in the context we're looking at we can very straightforwardly calculate a probability of a difference above a particular value. What this allows us to say which we can't say with traditional statistics if we have an effect for example we could say there's a 90% chance this intervention will be effective increasing in strength or there's a 98% chance this intervention will be effective increasing speed depending what we're looking at and obviously depending on the size of the effect but we've got direct probability statements about the effect and that's different we're not comparing it to the null we obviously can balance the null and the research hypothesis evidence for both with base factors but when we're trying to make parameter estimations we generally can get direct probability statements either way of course it could be 50 50 either way it could be any percentage we find but we are we can say that and communicate that with in the language that people tend to understand probabilities that make sense in a practical way to practitioners coaches and even researchers. Now for the symmetrical posterior distribution we just looked at the two types of interval the two types of credible interval I'm going to talk about now would be exactly the same so exactly the same values calculated in any of the intervals that we credible intervals that we calculated however that's not the case for skewed distributions as we'll have a look at in a moment so the two types that a credible interval are generally calculated and reported are the highest density interval or HDI sometimes called highest density posterior interval or highest posterior interval and this captures in the HDI all points within the interval have a higher probability density than points outside of the interval and that can be important particularly when we're looking at skewed distributions because we will get differences the other type of interval is probably familiar you're perhaps more familiar with really in terms of a traditional confidence interval the way it's described at least so this is an equal tiled interval so depending on what sort of percentage you talk about so if we talk about a 95% equal tiled interval or ETI there's 2.5% of the distribution on either side of its limits let's take a look at what that looks like and how they differ when we've got a skewed distribution so this is the 95% higher density interval of the difference in sprint times if it was a skewed distribution if we compare that with an equal tiled interval we see that the values in the interval differ and in this case because it's in the values that were measured so the non-transformed values the HDI the 95% HDI is the best units to use in transformed variables ETI will be the best bet but generally higher density interval is the best bet when we're using the units that we've measured so Bayesian analysis allows us to determine probabilities associated with both the null and the research hypothesis and that's something we can't do with traditional hypothesis testing because we assume the null's true so all the probabilities are based on assuming the null's true in the first place importantly when we're doing parameter estimation we get intuitively interpreted credible intervals let's look at some basic data analysis using JASP software so really do want to big up this software it's not only free to download and use but it's constantly being updated okay let's have a look at the data first now this is simulated data it is based loosely on a study that was conducted but is the data that was just loosely based on that in terms of me simulated some values so the actual study involved participants being blinded from each condition and the amount of capsules caffeine capsules received with the same irrespective of the dose so they got either placebo or three milligrams per kilo body weight of caffeine and the dependent variable in this particular instance is three counter movement jumps this is the data this was the data looks like spreadsheet it's the JASP software so we're going to look at JASP certainly look at i'm going to look at this software as I say there's a the range of traditional and Bayesian tests and you've got equivalence testing you've got mixed effects models there's a range of things you can do with it now and it's constantly been updated a great supportive network for the software so I can't recommend it highly enough really it's really excellent so we enter the data into the JASP package using a csv file so save the excel sheet as a csv file one of the options in excel and it looks very similar to the spreadsheet now we'll go on to the menu t-test menu and select Bayesian paired samples t-test i'm very similar to you might do in sbss or some other similar type of software i'll click on the variables and transfer them over to the analysis window and i get some output in the right hand pane if you like i'm going to add prior and posterior information visually because i think that's useful and i can also click some additional descriptives to have a look at what's happening so it looks like in the descriptives certainly the caffeine had an effect a bit more variation in it and higher standard error what's that come out as when we look at the the graph so we've got the the dotted line gives us the prior in this case it's a Cauchy prior centered on zero and a 0.7 approximately scale so we've got a fairly broad prior but it is centered on zero and given that prior we get this particular posterior there so that's a posterior distribution with a continuous line rather than the broken line and the base factor tells us that the data rate times more likely into the research hypothesis than under the norm if we wish to summarize about the differences and this is in terms of a standardized effect size so the effect size is similar to accounts d effect rather than rule values so we can summarize and say that the differences in conditions there's a 0.6 of a standard deviation difference so if you do like i'm not fun of it but if you do like the sort of standardized the sort of categories for effect size that would be a medium effect we can also get that that reflect that's reflected in the middle of that posterior distribution the median value of the posterior distribution that's why it's called median and then we get the the 95 percent credible interval which tells us there's a 95 percent chance that the true difference lies between 0.164 and 1.120 look at what that looks like visually i really would suggest that websites really excellent it also provides a range of other simulations but you can actually move the scale on this level look at what different currencies look like what different standardized differences look like and this particular point i've just moved the scale to 0.63 and what this suggests is i think also gives some other interpretations one useful one is the current u3 which tells us that actually this size of difference in terms of standard deviations means that 74 percent of the caffeine condition are above the meal mean of the control condition so it gives us a sort of common sense interpretation of that particular standardized difference if we were to run a standard pair samples t test on this we'd see that we've got a p value that would suggest that the data isn't really that compatible with the null so we'd say the data would be surprising i guess if if there really was no difference between these groups so in that way a similar conclusion in essence and we've got values of our confidence interval those are different because they're not standardized differences those are in more differences so we've got a moderate effect in terms of our base factor now we can do some additional stuff which i think is really useful to have a look at the effect of different widths of priors so if we have a look at the base factor robustness check all the priors even even a very wide priors comes up still in this moderate categories another interesting analysis we can do or check we can do is the sequential analysis so this tells us how evidence shifts with each data point so we can see initially there's some evidence for the null hypothesis but then as more data was collected evidence became stronger for the alternative hypothesis for the research hypothesis so let's take things further and conduct some additional Bayesian data analysis with some different software we'll mention well i'm here all these different softwares that i use and i think are really excellent so just we've mentioned but also stan i use an interface with r but you can use it with python and inla integrated nested Laplace approximation which is really quick i tend to use but not always but but often use the brms package in conjunction with stan because it's really flexible and helps you produce models quite quickly so let's have a look at the efficacy of a post game recovery protocol now this is simulated data but it's not completely made up i've tried to use some values that i've previously got from studies i've done in the past so groups involved treatment and control group randomized that's not actually always possible in applied context but for the simulation it's no problem measurement baseline two hours post game and then 24 hours post game dependent variable 30 meter sprint speed so this is data that's been simulated but is not you know is similar to some data that's actually been actually gathered some actual empirical data so let's have a look at the efficacy of the post game recovery protocol in this case for basketball players workflow in this sense i'm going to explore the data visually fit different plausible models based on my visual exploration of the data do some diagnostic checks compare models using two different systems one base factor the other one leave one out cross validation then i'm going to look at a you might extract some key information to address the research question explore the data visually so simply make some box plots of the three measurement points in the two groups so we can see here the baseline the pre-measurement the baseline measure treatment groups slightly faster slightly higher sprint speed meters per second both groups drop off in the post measurement and surprisingly post game and then both recover a little bit it looks like visually that it may be the case that the treatment group seems to be doing better but we actually have to look at what the outcome is of the the analysis also want to plot just to have a look at the density of the sprint speed the y variable the dependent variable in building the model i'm going to use a response distribution now we'll use different response distributions like a Gaussian distribution maybe a t distribution but i'm going to use a skewed normal distribution given this particular information and see whether that actually proves to model the data better than the other distributions one of the advantages of Bayesian modeling is that I can do this very flexibly so like in all data analysis there are a number of subjective decisions need to be made what i'm going to do in this as an overall model even though i'm going to change things like the response distribution and different prior information what i'm going to use is given i've got baseline measures for both groups i'm going to compare post measures at the two hours and 24 hour measurement points using baseline as a covariate so it's post measures conditional on baseline condition on baseline so i'm adjusted for baseline so this is similar to an uncover model and helps potentially avoid regression to the mean particularly given all that kind of affected all the treatment can affected is the post measures it can't have affected the baseline measures so the uncover model seems to me to be an appropriate model i'm going to use different priors so i'm going to use a weakly informative prior and this is a mean of zero and a standard deviation of three so it gives me a broad range of differences they're actually very unlikely so it's unlikely i'm going to get minus 10 meters per second difference in a 30 meter sprint between the groups or plus 10 meters per second so this is really weakly informed it's not a uniform prior so it's not putting probability across all values but it is but it's still very broad as far as the prior for the intercepts and the standard deviation go i'm going to use the BRMS prior so i'm going to use the BRMS package to fit this model and i'm using these as prior so it's a t distribution on the intercept so it's allowing with the long tails allowing values that are quite broad it's a very i found it useful in a number of models so i'm going to stick to the default model very similarly to the standard deviation which is a half student t doesn't allow me to have minus standard deviation so it seems it's a reasonable choice and works well as a default measurement constrained prior what this is what what i'm trying to get out here is i've tried to think about what really given this is a group that's been randomized to control and treatment what what likely differences are they're going to be in sprint times and i don't think they're going to be that much more than say 2.5 meters per second either way so i've centered this as a mean of zero and standard deviation of one to try and capture that again i'm going with the defaults on the intercept and the standard deviation my informative prior really based on some previous data so there's some previous studies that looked at similar thing and known which way around my groups are in the regression model then i'm going to put a mean of minus zero point five and a standard deviation of one to reflect the sort of general differences that have been found on average in some of the previous studies again i'm going to use the default priors on the intercept and standard deviation so i've got these weekly informative prior differences modeled by a normal distribution, the intercept standard deviation by the defaults in the package, measurement constrain model and the informative prior based on previous studies so these i'm going to include in all the models so i'm going to check someone do lots of different models in essence i'm also going to model this on a normal response distribution so as a normal response distribution a skew normal given that the shape of the y variable when i'm looked at the density of it i'm going to allow individual intercepts to vary because it looked like there's actually quite different sprint speeds across both groups and i've got several data points for each person and i'm given again it looked like there were differences in variants across the groups i'm going to allow for heteroscedasticity in the group so again i can build that into a model quite straightforwardly in the brms model or in stan run the model so this is some of the plots for the best best model in the end and so i've got some trace plots see whether they visually there's any problems even though just a visual inspection won't necessarily identify problems i can see if there is a problem i won't be able to say just because it doesn't look like there is that there isn't so i also do want to do some other checks and the r hat which comes out of the as an output in in stan and the the brms package the and some other packages as well allows me to say well if i'm looking for an r of around 1.01 but anyway it's certainly less than 1.1 and it appears that they have all converged as much as we can tell at least so it's likely that that that that's past that diagnostic check i'll point out the leave one out cross validation package but this allows me to produce this plot and have a look at whether any influential values then potentially outliers but they're influential values that might cause problems there might be problematic for the for the in terms the estimates potentially 35 dates point 35 isn't possibly 3 and 11 but i'm going to leave those in but i certainly could remove them if this was a you know a real analysis then i'll look at posterior predictive checks so i'm looking at whether the simulations in the model actually match the data so look at is it a real reasonable match for the dependent variable the y variable and it sort of looks like it is so those lines those lots and lots of lines are 100 simulations from the posterior distribution and the the black line is y so y versus y rep and the empirical cumulative distribution function as well in the same way it looks like it's captured it and then i can go down to the means and standard deviations of the groups on the bottom plots and it looks like the simulations seem to have captured the descriptives in the data so i'm pretty happy that's captured the data well the data obviously i know the rest of its random variables but i do know the data the data are fixed in a Bayesian model and i know what that is so i'm going to use that as an anchor to see how well my model performs the hubby criticisms of this so it's worth if you think you know you want to explore this further certainly look at some of the debates about using posterior predictive checks now for comparing models initially i want to use a base factor so remember there's three priors now the models that were best was a skewed distribution with random intercepts on all of the all of the three priors but now i can compare which prior is better if you like which supports the data better and comparing initially the measurement constrained prior with the weakly informed prior the measure constraint the measurement constrained prior that was restricted to sort of values around not restricted but putting probability on values around minus point plus or minus 2.5 meters per second suggests that that that model is 18 times more likely under the measurement constrained prior compared to the weakly informed prior so the measurement constrained prior came out as a better fit strong evidence if you like on a base factor scale now the data for the measurement constrained prior compared to the informative prior interestingly is even stronger so there's more evidence that actually the the measurement constrained priors a better model if you like of the data than the weakly informed prior sorry than the the informed prior which is interesting itself actually suggests that the data don't seem to be reflect previous differences in other studies now comparing models using leave one out cross-validation so it's a different system of comparison so if you choose base factor but now this is a different system and there's if you again if you want to read the literature on this you'll read literature on what we call open and close models so have a look at that if you want to look at what which one you might want to choose so loo I see the lowest value is the best value and it's a minus value so obviously the case in this case minus 19.4 is lower than minus 11 or minus 616.1 so loo I see sort of in that way similar to AIC or Akiiki information criteria and if you are used to comparing traditional models so that seems the weakly informative price so it's a bit different than what we came out with the base factor seems to be the best model now comparing the the expected log predictive densities what I'm interested in here is not just the difference but also how this difference compares to the standard error of the difference look it's looking like the measurement constraint model is not quite as good on on the the loo leave one out cross-validation whereas it was with the base factor so there's some differences in terms of that so these look like anyway the best models clearly the informative prior didn't come out very well in either model comparison method this first plot is a standard plot from the brms package a conditional effects plot on the y-axis we've got sprint speeding meters per second and on the x-axis we've got measurement times we've got two hours post and 24 hour post game and we've got the groups so we can clearly see um treatment and control group now what do those error bars represent well the error bars represent not standard deviations but they represent the credible interval so they represent the 95% credible interval point is the the most probable estimate for those particular groups so we can see quite clearly the treatment group are on average faster than the control group in the two hours post game measurement and also in the 24 hours post game but the uncertainty you can see the larger credible intervals there's much more uncertainty associated with that and they're clearly strongly overlapped we can plot distributions so have a look at that and i'll plot those next to pair wise comparisons so here we've got a posterior distribution plot of the differences the pair wise differences between each of the conditions and time points the dotted line represents zero this yellow distribution is our prior distribution and these blue distributions are the posterior distribution of differences so the peak is the most likely difference unfortunately if we're doing this really we might want to question some of these not particularly it's not so bad on the the questions we're asking but some of these are not they're multimodal points so we might have some issues with that but i'm not going to rerun the model this is just for illustrative purposes so what we've got here is one of the points of difference we want to know whether at the post measurement do the control and treatment differ there's a hundred percent chance the difference is less than zero if we look at the the second one where we've got 24 hour treatment and control comparison well there's more of an overlap and so it looked like there was on the conditional effects plot and this plot suggests that as well we calculate there's an 88.4 percent chance the difference is less than zero which means that there's a positive effect because we're taking treatment away from control and so whereas it's clearly been effective above zero two hours after the game it's more uncertain as we saw on the initial plot 24 hours after the game so as i mentioned previously it's not just above zero that we can calculate the probability of we can calculate the probability above any particular value and we certainly we can know there'll be some measurement error we might have some a better idea in fact of what an important difference is and to illustrate that i've just gone with a 0.5 difference so in this case that 0.5 is standard deviation difference between the groups i've set if you like as a region of practical equivalence we might say that this is a region that's equivalent to zero because above this is the values that i consider to be important so i'm saying that this is equivalent to zero because of the measurement error because of some value that i realize is actually important i've i could choose any value i've just chosen the standardised difference to 0.5 but i could go with a raw difference can calculate whatever you want to so it's difference is the probability above a medium effect and so this is suggesting there's a 99% chance or 99.6% chance of a difference so that the treatment group's been better when we go down the 24 hours what we've got here is a 68% so almost 70% chance so there is some evidence of a difference but we can't really claim to be sure we'd have to say that we really are unsure about how effective it is 24 hours after but we're pretty sure that it's effective two hours after so what can we suggest to practitioners well quite simply we can just use the actual direct probabilities because they're actually quite reasonable to understand so there's a 99.6% chance the intervention is successful in improving spin speed after two hours then if you didn't do it so the intervention is worth investing in if that's an important thing so where you're in a tournament and two hours between games this intervention is well worth using and investing time and resources into a quote from John Tukey I think is really pertinent to Bayesian analysis he said it's far better an approximate answer to the right question which is often vague than an exact answer to the wrong question which is always which can always be made precise so i think with Bayesian analysis you can focus on the things that you want to usually ask that is how effective is my intervention being what's the probability that that i've got an effect what's the probability an effect of a particular value i think those questions can be really usefully addressed with Bayesian data analysis so to review prior scientific knowledge can be incorporated into Bayesian models very easily that's possible with classical models but much more difficult in fact this is the basis of Bayesian models I hope you've seen that the results of Bayesian analysis are very intuitively interpreted we can talk to practitioners and coaches and other researchers and readers of our research in ways that they usually intuitively understand in terms of probabilities that they use in the everyday language there is a 90 chance that this intervention works for example and complex models that are problematic for non-Bayesian analysis can be constructed and used the last example was a series of different response distributions allowing for heteroscedasticity so we modeled sigma across the the different groups it allowed random intercepts and in some cases random slopes and we incorporated different degrees of prior knowledge so a certain level of complexity I hope this has encouraged you to have a look more into the possibility of Bayesian modeling your own analysis and that at least you enjoyed looking at a different type of statistical analysis method thank you ever so much again Tony for that and thank you for jumping on a call again now and for some follow-up Q&A after the pre-recorded lecture as well and yes I think the first thing I wanted to discuss really was what advice you've got for anybody wanting to start doing some of the things you talked about in your lecture yeah thanks to um but one of the things I'd say is the JASP package is really user friendly I mean I know SPSS offered Bayesian sort of common tests now but I think JASP is really sort of initially focused on that and I think it's useful even where I think one of the useful things you could do is even where you've conducted research before or using literature there's a there's a there's actually one of the I'll just find the name of it actually it's one of the where you can actually look at taking previous results and reanalyzing it it's called the summary statistics model in JASP and I think that's that's a useful one for simple analysis like correlations and I just to see you can play with the prior so you can you can look at the the plots to see how that shifts with different prior assumptions I think that's one of the perhaps a very easy way to get into it if you've conducted research or a very familiar research that's done fairly straightforward difference tests or relationship tests you can use those direct statistics in that summary statistics module and it will give you an idea of how different prior assumptions would shift so you know it's essentially you don't have to do any analysis that's all you put it in I think that's a useful gentle introduction and then following that is actually just data you've collected just to have a look at some of the basic tests because they're sort of very similar that it's it's user friendly in the sense that they've got t-tests paired t-tests and over and covert regression now they've added mixed generalized linear models mixed models and and mixed effects model generally so you've got you know it's it's growing and it's University of Amsterdam and they do constantly update things when a new a new program comes out you know a new update comes out there's several different modules you can add to that and so I think that's a useful easy way to get into it because it's quite familiar for those who've used traditional statistics they're also traditional statistics on there so it's very easy to compare results you know with doing a you know if you do an independent t-test traditionally then you can look at how different that might come out with different priors on a Bayesian t-test so that's probably an easy way in really okay yeah I think I'd second that as well having had a quick play around myself I've not used the summary statistics module but just I'm also a big fan of just like I think it stands for Jeffrey's amazing statistics package something like that um yeah it is as you say even if it's for frequentist or traditional statistics it's very very easy and user-friendly and like you say if you use the default settings it is as easy as instead of clicking independent t-test you click Bayesian independent t-test or instead of clicking regression you click Bayesian regression um but I guess I was going to ask this question later on but that's kind of led into it maybe something I've been thinking in my own research bit is just too easy do you think so with all the default settings for Bayesian analysis is it too easy so just a bit of a background there my own journey really with this I've initially started using some Bayesian research but then I'm almost at the point now where I've realised I don't understand this well enough and next time I use it I want to do some more of what you did in the lecture of actually playing around with different priors and I think potentially yeah it's very very easy is that necessarily a good thing or are there some negatives of that it depends what you want I mean it's the model types are limited so you've got a Gaussian distribution essentially a normal distribution for those models it's limited as well to so in a difference test to a standardized difference so the priors are standardized differences so when you you can use informative priors that's probably something that extends to other models so the moment you can use informative priors on t-tests for example but it is done that standardized difference so where you know you want to do something a bit different where the distribution and they're all symmetrical distributions where you want to do something a little bit more challenging in terms of modeling you might want to look at other programs I think the benefit pedagogically of using something like stan even though it's a steep learning curve is you really have to specify the model so you specify priors on everything I think a gentle introduction to that is the package I mentioned in the presentation Paul Bertler's BRMS package or our stan arm which is the other stan linked package which uses our code so of course you've got to make this this is only easy if you use r but the code's very similar to to using other packages with frequency statistics in r so it translates very easily you can also generate code from BRMS packages to have a look what stan code might look like it's not exactly like you write it in raw code but you can see what priors are on what elements because whilst you might in in just choose just a prior for the standard standardized difference really there are also priors on if you're building a multi-level model on the intercept all right you know so what's the beta value what's the standard deviation you're going to put a prior on now the the defaults produce you know a useful place to start because the defaults produce very similar on similar in models to frequency statistics that's what they're designed to do really so you know you can just interpret those as a basing in a basing way but in essence it gives you that once you've got familiar with it gives that chance to be able to play with different priors and it's very useful in terms of priors really I mean it was a crude example that I gave in in the in the presentation just a limited example of saying well this is the difference so I didn't look at the intercepted in you know a standardized difference across all things choosing just sort of a confidence interval from frequency statistics and a measure of central tendency and mean difference but you can do things much better in the way you can do where I showed an example of posterior predictive checks you can do prior predictive checks so in the code you just put sample prior only you know you can you can it'll allow you to play about with how do priors influence results sometimes in fact the example I gave it actually made them more conservative because it's you know it's fixing it on the difference so it made the differences a little bit less they could say well even though that didn't come out as the best model that may be the best choice because maybe my data's you know that data's more substantial it was taken from a metro analysis or it was taken from a series of studies rather than my study so it depends where you start um but that's sort of I think that flexibility and understanding comes from you know having to once some model a little bit more complicated probably gives you that idea of you know how what are these things actually and do you know what the what's the prize doing and how do they influence results I think that can be useful so I think just is a really excellent start it's also gaining in terms of complexity what you can do and that who knows where that will end up but to do things in a little bit more detail and perhaps pedagogically to understand things even if it's a bit of a steeper learning curve something like using stan or python or some of the other where you you know integrate me stan are or python integrate me stan whichever you're comfortable with and depending what background you've got I guess um let's see you know those packages like brms or arstanama useful introductions to that I think but and in terms of text on that I think a very good pedagogical text in terms of stan even though I do like john kruski's book on doing Bayesian natural analysis that's probably I've done a lecture with him in in Oslo and he was excellent he's really good um Richard McAlrith's book called rethinking statistics they're both on these are on second editions now really start very basic building up model so it's really interesting pedagogically to build up models to start with he uses his package but you know it's his stan is extended code for a range of other sort of like python etc so I think you know that's a really useful introduction is it quick not necessarily you have to work through it work with it but I think in terms of understanding that it is really good brilliant thanks yeah I think very good point you made early on in that answer was around even if you use all of the default settings you're essentially doing something very similar to what you would in a frequentist analysis anyway and I think yeah from when I've played around with it generally it's slightly or a slightly higher level of evidence if you use something like a base factor of three as you're cut off generally you get very similar results but just a slightly higher standard of evidence so I guess yeah it is quite comparable and yeah another worry that maybe you can help put to bed is when you start I think when people see you playing around with priors they maybe start to think is this getting a bit subjective so rather than just saying oh I've ran a t-test you're now saying it depends on the decisions I make around these priors can you maybe speak to that a little bit yeah I think I mean it's I mean one of the things I think it's useful is actually running different priors and seeing what effects it does have on your data and he said mostly there's certainly some of the analysis I've actually really done which tend to make the differences relations more conservative than if I use the wall value so it hasn't inflated things it's on the opposite it's sort of and I've generally used priors really that have and what am I can this is not a technical term need something I've used in the presentation called some measurement constrained priors that's not a technical term really what it is is to say that I've looked at what the possibilities are in measurement so I don't know in in the example I gave I'm not an expert in sprint speed times but but I'd say that really I think maybe the highest speech humans can get you know I'm if I if I say that the biggest difference could be the highest sprint speed somebody could get and say somebody fell over that's the maximum they could be so if I put probability over that then I'm very safe in in sort of not gonna you know not skewing anything nobody's going to disagree that people could run you know 20 20 meters per second you know so you know I'm going to sort of constrain it within the measurement framework and that tends to actually make things more conservative an extreme value is not going to necessarily pull estimates so I think um I'll to answer your question really I think I'll try out different priors you have to obviously justify it's one of the things you need to do say you know if you don't know much about anything a weekly informative prime what that means exactly useful um but try those things out you know in one paper I definitely produced estimates of different priors because it shows in fact that actually the prior just made it more conservative and and a frequentist analysis would have been like a uniform prior and it would have been a more extreme result than the actual Bayesian analysis with a with an informative or you know measurement constrained prior so I think don't be frightened of priors you know it's you're not trying to compete I mean I'm not saying that you can't with Bayesian analysis do like you can with frequentist analysis choose the date you want to do you can still manipulate things if you're that you know of that mind but but of course if most people are not most people are honest and they want to present things and generally present evidence so I think I'm going to look at the effect of different priors is the best way of doing that because sometimes it you know might be that you know that you've got a lot lots of daters priors get washed out but where we in small size use quite small sample sizes given the sort of way you test things lots of you know lab tests and etc can be quite intense can be difficult to get really big data sets on that so using prior information can be quite useful in that but seeing how it affects your results I think is one way not to be frightened of that this is a subjective decision that it's actually going to make inflate things I think often in practice it does the opposite yeah I think again the excellent point you made there was about just defying the priors and it's the same as any decision you make in your statistical analysis so I guess any of your methods but if you're justifying why you've made each decision hey people can follow that but also it's there for reviewers or readers to evaluate and I think yeah that's fine then yeah and the other point you mentioned was about some of your previous research is for the benefit of the people kind of watching this some examples of papers where Tony's been involved in used basing analysis the links to those papers are below the video so if you want to go and have a look at some examples then they're down there as well and yeah I think probably the last slightly technical question I had was around again priors but getting information for your priors from the literature so can it be as simple as another study in a related area found this parameter estimate and confidence interval or this mean and standard deviation so I'm going to use those values as my prior or kind of is there anything more complicated than that that needs to be done it could be as simple as that students I mean it can be depends really on what model you're producing so it can be as straightforward as that but what I'd say is it's really worth doing those prior predictive checks to see where that priors reasonable because in things we might not think of an impact can so I think it's it could be just those you know just doing a straightforward and then the example I gave was that is a very basic sort of difference between groups you know controlling treatment group previously so it could be as simple as that but I think that realistically you'd want to check that you also want to check that against other priors you know so what what you know a default prior what what would be changing that prior slide you remember that you know we're trying to get that that has still be essentially often descriptive you know if it's a frequentist analysis it'd be really their mean difference for example and with a confidence interval on it and and we're trying to estimate often trying to estimate if you like a population estimate the data generating processes or whatever you want to choose to phrase that but it but so that may not be a good reflection of that one single study so it's you know it's still you want to edge your bets a little bit not make it too constrained make it a little bit looser but have a look I think the thing is simulating stuff where you can have a look at what the influence of particular things has on the analysis and the outcomes as you see if you have a look at just you see that the example I gave was all of the all of the priors made no difference in terms of the conclusions so I think you know looking at that does where it does change your conclusion that's where really that needs a bit more scrutiny if it doesn't well you know there's not really an issue particularly but I think where it does change it you know it's often that's where the problems may be if you're making a different claim than the rest of the priors so what you know that's where you'd have to clearly justify the prior clearly justify what you're doing so it could be as simple as that but I think you need to check and see what against other things and check you know in terms of prior predictive checks etc okay so yeah you just then you mentioned about the possibility of it changing your conclusion which is just another thing I was going to mention just to be annoying really um but you said at one point I think in terms of frequent statistics you're not a fan of categories of effect size and where so when we then in terms of base factors when we then say kind of below three is anecdotal three to ten is moderate above ten is kind of strong very strong extreme etc like is that the same thing and are there any limitations of saying 2.9 isn't an effect but 3.1 is etc absolutely absolutely yeah of course anytime you put a category on a continuous variable you get those things like 2.9 and 3 what's the difference very little really I mean I'm in terms of standardized differences I'd rather interpret in some you know some way like commons u3 where you can actually be a percentage or if people don't understand standard deviations difference for example I think base factor the same it's how many times more likely so I think in a way you know you can say you make a judgment and claim based on that so if something is a hundred times more you know the data a hundred times more likely under the research I pass in the north I don't think anybody would argue that your data that's that's the case with your data and the reasonable conclusion where it's 2.9 or where it's 3.1 there edges to a category in there that's certain of course you know so yes the hard bound categories are problematic I think always on continuous variables and our caution against those particularly I just think in a way it's like I agree you know people want to know what what does this effect size mean and you know it's okay it's okay initially I think to have a look at those in sort of categories but there is issues with them and I think the same with base factor I think to introduce the idea of it it's useful to use as an absolute any mechanistic decision making tool and really much against generally okay um yes so I've got one more question before well I think um is this might be a yes or no but with Bayesian analysis do you need to correct for multiple comparisons like you would with frequent tests? You do if you're doing hypothesis tests so I think um in terms of I think JASP offers some comparison I'm not sure it does with base factors actually so yeah you in a way you can do that I think if so if you're doing base factory it's a hypothesis test you won't need to if you're doing parameter estimates necessarily because you're not doing the same thing you're trying to get the best estimate of a difference so where you're trying to make a decision and for against the null hypothesis then in a similar way you can I think JASP I don't really do that much hypothesis testing to be honest but I think if you look on JASP there is some controlling for multiple comparisons but you'd have to check where you're okay and then yeah I think the last thing I was going to mention you've probably covered most of it because I know when we spoke before I said about giving more information to readers or reviewers so that they can confidently evaluate a paper that's used Bayesian analysis even if they're not used to doing it themselves I think you've probably discussed a lot of things that would be useful there and especially things like justifying all of the decisions and the priors but is there anything else briefly kind of that you can think of to add that would help people to decide whether they're reading something that's done Bayesian analysis the right way or the wrong way I think the assumptions generally so if you use a linear model and you'd use a sort of likelihood the assumptions don't change they just replace basically you know the random variables but essentially if you're making a claim of the so there's similar assumptions the same frequentist assumptions you know you might get what error distribution you have around you know so you know if you're doing a regression for example you might say your errors on the regression to get the confidence intervals are Gaussian you know Gaussian error distributed the errors are distributing in the Gaussian manner you could you can do that the same on a on a Bayesian analysis it's you know you use a distribution but it's a Gaussian distribution I think it's easier to put different distributions in and check so you could do skew normal you can also look what a t distribution response distribution looks like how do those but but essentially model checking in a way is the same so if you're making claims of this is a linear model then still those assumptions hold so you don't know but you know you don't suddenly magic away some of those things those claims so you still have to justify those claims you know of of independence or whatever else you're doing so multi-level models or you know whatever type of model they say similar assumptions in terms of the model hold still you might still do some additional things about as I said posterior predictive checks etc prior predictive checks obviously you're in addition approach is different but in terms of the likelihood that's the same as it is for a frequentist analysis so don't you know so if they're trying to pull the wool over reviewers eyes in a way and saying I don't need this the other thing is I guess where they say it's okay for small samples it is but it's not if you use a very very weak prior it doesn't really you know it's still not that much better than a frequentist analysis so on the whole can it be useful for small samples it can if you're using some reasonable prior information because what you're doing is using that and then adding your data to it whereas if you're you've got a very few subjects and you're not using that using a very weak prior uniform prior it's really not that different than than doing a similar thing with a frequentist analysis so it's not magic if you've done bad data collection then you're going to get bad results excellent yeah completely agree and I think yeah that last point's really useful about because you do hear within sports science especially a lot of people with issues around magnitude based inference and things but saying oh okay now I'm going to use Bayesian analysis for small samples and that's a very good point that the benefits are around the prior whereas if you don't really have any prior information then you've still just got kind of a very small sample and you're not adding any extra information into that because there is no informative prior yes okay excellent well yeah thank you ever so much Tony it's been really useful and thanks for joining for the extra chat now as well thank you Steve thank you and just yeah for everyone else I know we've had it on the screen but just keep an eye out for the last couple of lectures over the next two weeks with and Kristen Sonani who gave an excellent lecture on statistics but she's now back by popular demand and doing a talk with some tips for scientific writing which is something else that she delivers online courses around and then I'm really excited for Walter Hertzog's lecture on muscle mechanics as well to finish it off yeah thanks Tony thank you very much thank you cheers