The t-distribution If you have a randomly distributed population and you take random samples from it of size n. You can calculate the sample mean (X bar) and the sample standard deviation (S).If you also know the population mean (mu) you can calculate the above statistic. If you continue to resample the population randomly you can build up a series of these values that defines the t-distribution. I thought that I had better work this through using Minitab and Excel. I used Minitab to define 10,000 normally distributed values. I then wrote a macro, in Excel, to obtain 1000 random samples, of size n, from the 10,000 values and to find the sample standard deviations and sample means. Knowing the population mean already the above statistic is easy to calculate for each sample of size n. The spreadsheet would then output the t values which I fed back into Minitab to output as histograms. The shape of the t-distribution changes as n increases. To begin with the t-ditribution is like a normal curve but with broader 'shoulders'.
The overlaid blue line represents the Standard Normal curve i.e. N(0,1). As the sample size n increases the t-distribution moves closer to N(0,1). Here are the normal probability plots for the same data:
The final plot on the group is the original data. The panel names are the sample sizes (n). I was also curious in regards to the Central Limit Theory. Which is also demonstrable using the spreadsheet and Minitab. Basically if you keep sampling your population randomly the samplng distribution will show as a normal distribution with a mean value that should be the mean value of the main population. The more samples and the larger the samples you take from the population the smaller will be the sample distributions standard deviation and the more accurate will be your estimate of the population mean. Here I have varied sample size (n) and the number of samples taken. The name of each panel is (sample size, number of samples).
This next trellis plot is for a constant number of samples (2000) but the samples are of increasing size.
|
|