Functions: Statistical (or Probability) Distributions

As the title indicates, presently this section deals with statistical functions only, and a small selection at that. It will hopefully be expanded in the future.

All distributions will be used with the "display" command, but of course they may likewise be used in programming etc.

The binomial distribution

Finding probabilities of successes

A binomial distribution has two parameters: n, the number of trials, and p, the probability of the outcome of interest ("success"). We say that a random variable has distribution B(n,p).

dis binomialp(3,1,.3)

will display the probability that exactly 1 (one) success will occur in a random experiment with distribution B(3,.3), that is, three trials and outcome probability .3. Stata will render the value .441.

Note that you may write dis binomialp(3,1.8,.3), requesting the probability that you will observe 1.8 successes, which is impossible as the values of a binomial random variable are always integers. Stata will use floor(1.8) instead, that is, 1.

dis binomial(3,1,.3)

will display the probability that 1 (one) or fewer successes will occur in a random experiment with distribution B(3,.3). In other words, Stata will render the value of the cumulative probability function. The probability for 0 (zero) successes is .343, and together with the probability for one success (.441) this will yield a cumulative value of .784.

dis binomaltail(3,2,.3)

will display the probability that 2 (two) or more successes will occur in a random experiment with distribution B(3,.3). In other words, Stata will render the value of the cumulative probability function for k (the number of successes) or more. As the value for up to 1 success is .784, the probability for 2 or more (that is, 2 or 3) successes by necessity is .216, and this is the value Stata will display.

Finding the parameter p, given probabilies for the number of successes

dis invbinomial(3,1,.784)

will display the parameter p (that is, the probability for success in one trial) that corresponds to a binomial random trial with n = 3 and probability of .784 for 1 (one) or fewer successes. We know from the preceding that this parameter is .3.

dis invbinomialtail(3,2,.216)

will display the parameter p (that is, the probability for success in one trial) that corresponds to a binomial random trial with n = 3 and probability of .216 for 2 (two) or more successes. Again, this parameter is .3.

The hypergeometric distribution

This distribution describes the behaviour of random variable with a binary outcome for samples without replacemet. It has four parameters: N, the size of the population, K, the number of successes in the population, n, the size of the sample, and k, the number of successes in the sample.

dis hypergeometric(30,9,3,1)

will produce the cumulative probability for k = 1, i.e., the cumulative probability for obtaining 1 (one) or fewer successes, which is .7931035. In contrast,

dis hypergeometricp(30,9,3,1)

will yield the probability for k=1, which is .46551724.

The normal distribution

Normal distributions have two parameters; the mean, referred to by stata a m, and the standard deviation, denoted by s. As there is a infinite number of normal distributions (with different parameters m and/or s), statisticians often use the standard normal distribution with m = 0 and s = 1.

dis normal(-1.959964)

will display the quantile of the standard normal distributions that corresponds to the value -1.959964. Stata renders 0.025, that is, the 0.025 quantile (or 2.5 percentile).

dis invnormal(.025)

will produce the inverse result, that is, the value of -1.959964 which corresponds to the .025 quantile of the standard normal distribution.

dis normalden(0)

will display the density of the standard normal distribution at 0, i.e. .39894228 (the maximum, of course). This command has versions which accommodate for normal distributions with means and/or standard deviations that differ from those of the standard normal distribution. Thus, dis normalden(0,2) will display the density of a normal distribution with mean 0 and a standard deviation of 2 at the value x = 0, that is, its mean (the result being half the value of the standard normal distribution), whereas dis normalden(0,1,2) will produce an even lower value, i.e., the density at value 0 of a normal distribution with mean 1 and a standard deviation of 2.

The t distribution

Student's t distribution has the same shape as the standard normal distribution (and mean 0), but actually there is (in principle) an infinite number of t-distributions that vary according to their "degrees of freedom" (d.f.). As the d.f. increase, the t-distribution approaches the standard normal distribution. Thus,

dis t(100000000,−1.959964)

will display 0.025, that is, the 0.025 quantile (or 2.5 percentile), the quantile that corresponds to the value −1.959964, in the case of a t distribution with 100,000,000 d.f. Fewer d.f. will produce values that are slightly larger, as the t-distribution will become more spread out. For instance, dis t(10,−1.959964) will yield .03922046.

dis ttail(100000000,-1.959964)

will give a value of .975, i.e. the probability of value of -1.959964 or higher.

The inverse is obtained, unsurprisingly, with the command

dis invt(100000000,.025)

which will yield −1.959964; the command invttail is available as well.

The Chi² distribution

Thie chi-squared distribution again actually is a family of distributions with different degrees of freedom.

dis chi2(1,3.8414588)

will produce .95, which means that the probability of obtaining a value of 3.8414588 or less is .95, or, put differently, that 3.8414588 corresponds to the .95 quantile, in the case of a chi-squared distribution with 1 d.f. In contrast, dis chi2tail(1,3.8414588) will return 05.

dis invchi2(1,.95)

will yield 3.8414588, and dis invchi2tail(1,.05) will produce the same value.