RUU #19: Inference on the free binomial model

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
1,186
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Feb 28, 2008

In this clip, I study inference on the free binomial model. Recall that the binomial distribution gives the probability for a given number of successes,k, for a given number, n, of repeatable independent trials. A fixed success probability, p, is assumed (hence repeatable trials). The object is to gather info on the value of p, and handle that info using Bayes formula, since it will typically be a' priori uncertain.

The example is from a game called Age of Wonders (Shadow Magic), which I warmly recommend, by the way. (It's an excellent time waster :) I'm studying the probability for a knight winning over a dragon, by doing 100 separate «battles». The result is 9 surviving knights. I then study that data using probabilistic theory. The results include the posterior probability of p (the probability of p given the data), estimates of p, the standard deviation of p, a 90% credibility interval for p (an interval of p where more than 90% of the probability is situated) and predictions of future events based on the posterior distribution.

The slides can be found here:
http://folk.uio.no/trondr/uncert19.pdf

R code for doing the inference can be found here:
http://folk.uio.no/trondr/uncert19.R

The statistical programming tool R can be downloaded freely, here:
http://www.r-project.org/

  • likes, 0 dislikes

Link to this comment:

Share to:

Uploader Comments (trondreitan)

  • thx Dr trond

    i've problem in drawing the binom distribution cause after 51 values of x it give error and didn't give any values for the rest of 101, in excel the function called BINOMDIST(no of success trials, no of indp trails, probability of success of each trial, false) = dbinom(x,n2,p[i]), so no of success trials should equal probability of success of each trial

    can this method use with categorical prediction, or only for numerical prediction

    thx for your playlist , i learned alot form it

  • @besbesmany In the R program, a for-loop is used for going through all possible values for p. For each single possible value for p, dbinom(x,n2,p[i]) is called, where p[i] is the i'th value for p. Only x takes several values here, ranging from 0 to 50. It may be that excel needs a single value rather than an array, in which case you need a double for loop. One ranging over the possible outcomes and one ranging over the possible values for p.

  • (cont) It is the for-loop that then goes through all 101 values for p (a stand-in for the uncountably infinite possible values of p). Inside the for-loop, the routine calculates the outcome probability for all 51 possible outcomes for each possible value for p in one feel swoop using dbinom(x,n2,p[i]), since x is an array here. It returns an array of probabilities for each element in x. If excel doesn't allow vector inputs, you would instead need another for-loop going through x.

  • (cont 2) It may be valuable to try to run the code using R so you see what is intended. If you just repeat the name of the variables or the functions, you can see what they contain. R is free, so anyone can use it. It may then be easier to replicate the results in excel.

  • Dear trond , thanks alot for the video, i've a problem in prediction of 50 next battel pred=pred+p.post[i]*dbinom(x,n­2,p[i])

    i plot the case on excel but i couldn't understand the prediction dbinom(x,n2,p[i])

    x is only 50 numbers but p[i] is 101 numbers how can i get this binomial distribution??

    i can send you excel sheet to help me but tell me how

    please make more prediction examples specially if we have database with categorical columns, how can i use the bayes rule and max likelihood

  • @besbesmany See also my series on "YT Identity Survey" for some more on testing with categorical data. In that case there are both several "columns" and "rows". Having both columns and rows means that the data have more than one categorical bin they fall into. For instance, on can both be a female and a christian. The object is to find out if different rows have different column probabilities or not, i.e. if there is dependency between the columns and rows (between gender and religion).

see all

All Comments (18)

Sign In or Sign Up now to post a comment!
  • ok i'll read all your replies here and tell you if i could solve my excel problem :)

  • (cont 4) When taking the continuous limit, this sum of binomial distributions is known as the beta-binomial distribution, by the way. It is the predictive distribution for the number of successes in new trials in a binomial model with uncertain success rate, p. Ideally, this example should show the difference between a specific model prediction (known p) and and an overall prediction based on previous data.

  • (cont 3) The Pr(x|p) is of course the binomial distribution for a given success probability. Since we are not certain, after 100 trials, what the success probability is, we get a weighted sum of binomial distributions, using what we have learned about the success probability, p, from the previous trial. This sum of binomail distributions is not itself a binomial distribution. It is more spread out than what you get from a max likelihood estimation, because we are not certain about the value of p

  • (cont 2) The code "pred=pred+p.post[i]*dbinom(x,­n­2,p[i])" gives a probability-weighted sum of the binomial distribution. The principle behind this is rule 6 in clip 2b: Pr(A)=sum Pr(A|B_i) Pr(B_i) where the sum runs over all possible B_i's in a partition of models. In this case the p's take the role of B_i and the outcome x takes the role of A. The probabilities in question are the posterior probability after handling the first 100 trials, which then form the prior for handling the next 50.

  • (cont) The possible success probabilities can range continuously from 0 to 1, but since I wanted to avoid taking the continuous limit (as that would involve integral calculus and make the mathematical aspect much heavier). So instead I choose a discreet range of possible values for 'p': 0.0, 0.01, 0.02, ... , 1.0. Since 'p' is the 'model' part of Bayes equation here, while 'x' is the data part, there is no need for these to match in numbers.

Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more