Confidence Intervals: ci and centile

In addition to the procedures described in the previous entry, Stata offers some commands for the estimation of confidence intervals for means, proportions, counts, and percentiles (plus, as of version 14, for variances and standard deviations). Note, however, that the complex estimation procedures mentioned in the previous entry (with two of them outlined in more detail in the next two entries) are not available. On the other hand, a few special estimation procedures are available particularly concerning proportions.

Warning: The following procedures may give strange results, i.e. standard errors of 0, or integer values for the C.I., which is a very rare thing to occur. This seems to happen when the display format of a variable is defined in a way that no decimal values are given (which may seem perfectly ok if a variable has no decimal values). It does not happen always under such circumstances; as yet I could not find out the exact circumstances that cause such strange behaviour.

If such strange results occur, change the display format of the respective variable, either via the Variable Manager available in more recent versions of Stata or via commands such as format income %10.4f; this means that variable income will be displayed with an overall width of 10, among which 4 decimal values.

Of course, you may also use the format command to influence the decimals in the output for other reasons. For instance, frequently the results displayed are too exact; you will not present means or C.I.s with six decimal values to any audience. On the other hand, you may easily do some rounding on your own.

C.I. for means, proportions and counts

With version 14, some changes have been introduced: Command ci has to be accompanied by a keyword that indicates what kind of confidence interval is requested.

Stata version 14

Note that all command that follow permit varlists, that is, you can request confidence intervals (of the same type) for several variables.

ci means income

will compute a 95 per cent confidence interval for the mean of income. Request a different confidence level with option level(#), with # being replaced by, say, 90, 99, or whatever you like.

With count data, option poisson should be added.

ci proportion gender

will compute a 95 per cent confidence interval for variable gender. Note that the variable(s) to be analyzed must consist of values 0 and 1 only, and the procedure will compute the confidence interval for the proportion with value "1". The confidence interval computed is an exact interval based on the binomial distribution; several other intervals are available which may requested via the appropriate option.

In contrast to earlier versions, procedure ci now also offers computation of a confidence interval for the variance (or the standard deviation) of a variable. As an example, use

ci variance income

and add option , sd for the standard deviation. An alternative version of the interval proposed by Bonett, D. G. (2006), Approximate confidence interval for standard deviation of nonnormal distributions, Computational Statistics & Data Analysis 50: 775–782, is available via option bonett.

Earlier versions

ci income

will compute a 95 per cent confidence interval for the mean of income. This example shows that the default for ci is to compute a C.I. for the sample mean of a metric variable.

ci gender, binomial

assumes that gender is a binary variable with values 0 and 1; it will display the proportion of observations coded "1" and the exact 95 per cent confidence interval for this proportion. Note that a number of different estimation procedures for proportions are available, such as the Agresti-Coull confidence interval.

For counts, use the poisson option (again, see Stata's help for more on this).

ci income, l(99)

Option l (letter l), or level, may be used to obtain a confidence level that is different from the default.

C.I. for the median and other percentiles

centile income

will compute an exact 95 per cent confidence interval for the median of income. This example shows that the default for centile is to compute a C.I. for the sample median. In other words, the "centile" displayed (under this heading) in the output is the point estimate of the median; the C.I. reported gives the values of the respective observations, not their position in the ordered observations.

There are some options available which may easily be accessed via help centile. The most important among these is the option to obtain point and interval estimate for other percentiles. It goes like this:

centile income, c(25)

(with c as an abbrevation for centile). The l option explained above (last entry in the preceding section) is avaible as well. Other options refer to estimation procedures which in my view typically will not be those you'll want.

© W. Ludwig-Mayerhofer, Stata Guide | Last update: 28 May 2017