A Quick Guide to Confidence Intervals

[ 2008-November-14 11:34 ]

I do not have a strong statistics background, so this may be incorrect. Please consult a statistics expert before applying this to your data (many universities have statistics consultants: use them!). For my experiments, I measure some variables which are samples from what I assume to be some Gaussian (normal) distribution, with an unknown mean and deviation. To compute a confidence interval for the sampled mean, you must use Student's t distribution. Effectively, you compute the mean (μ) and standard deviation (s). For N samples, the degrees of freedom (df) is N - 1. You then look up the t value for the value of df and the confidence interval you want from a t table, or using the inverse t distribution CDF. Finally, you get a range: μ - ts/sqrt(N) ≤ μ ≤ μ + ts/sqrt(N). A good explanation of how to compute confidence intervals can be found at HyperStat.

Computing the inverse of the t distribution's CDF is challenging, as it requires careful numerical methods. People do research on this kind of thing. Excel contains a function, `TINV`, that does this for you. Scipy contains `scipy.stats.t.__ppf`, which is implemented by some Fortran code. JSci is a Java library that has an implementation. I ported this implementation to Python, which surprisingly was less challenging than I expected. My implementation uses the bisection method to find the inverse, instead of the Newton-Raphson method used in the original JSci code. This requires more iterations for the same precision, but saved me from having to port the PDF implementation (which is the derivative of the CDF, required for Newton-Raphson).

My Python implementation contains `statistics.tinv`, which is a wrapper around the inverse which looks just like Excel's `TINV` function (albeit less precise). It also contains `statistics.stats` which computes a number of summary statistics from a list of values.

Python implementation: statistics.py

Example Usage:

```>>> import statistics
>>> values = [200, 240, 300, 410, 450, 600]
>>> mean, median, stddev, min, max, confidence = statistics.stats(values)
>>> print mean, median, stddev, min, max, confidence
366.666666667 410 149.354165214 200 600 156.737514156
>>> mean - confidence
209.92915251107823
>>> mean + confidence
523.40418082225517
```