Basic one-way analysis of variance (ANOVA) is a statistical technique used to decide the probability that the means of a group of samples are in fact all samples from the same population given the variation in each sample. ANOVA assumptions
To perform an ANOVA you have to calculate a number of different figures.
There is also a simple worked example
Which is the sum of the sum of the residuals for each treatment group Here is the basic technique for finding RSS
So it's a bit like a measure for the total amount of variation you are getting across the set of samples. Explained Sum of Squares (ESS)
ESS gives the 'ideal' situation where all k (our case 3) treatments have no variability in the samples. i.e. each treatment has n values and all n values in the treatment are the same. This would mean that if the treatment means where indeed different then there would be no doubt that the treatments were different as they do not vary (normally or otherwise) around the treatments average. So ESS is calculated by finding the means for each treatment and then subtracting the group mean from the total mean (for the entire set) and squaring this residual. Once we have the squared residual we then multiply each treatments squared residual by the the number of samples in each treatment group. Put simply, this gives an indication of how much of the variation is explained by the fact that the data is grouped into treatments.
Now we need to generate our last value. For this we need the overall mean for all the values regardless of the treatment. If the groups had NO EFFECT WHATEVER on the position of the mean for the samples (i.e. all the samples were really from the same overall population under the same conditions) then RSS and TSS should be roughly equal. However we are not quite there yet. Interestingly if you take ESS and add RSS then you get TSS
Anyway, let's leave that for now... The df shows the number of values in the calculation of your statistic that are free to vary. In the calculation of the mean this value is 'n' (the number of samples) because each value is free to change. However in more complex statistics, such as the standard deviation, the calculation of the statistic uses the average value in the calculation. This means that there are fewer degrees of freedom as once you know the mean and n-1 values the 'n'th value is no longer free as it can only be the remaining value. As a result the system does not have n degrees of freedom. It as n-1 degrees. Now to work out the values actually used in the ANOVA statistical test we need to generate a couple of extra statistics. They are the 'Standard Error' values for RSS and ESS and to do this we need to know the degrees of freedom for RSS and ESS. For ESS the number of degrees of freedom is k-1. Please bear in mind the definition of ESS above. This is because we know the overall mean ( For the RSS the number of degrees of freedom is n-k. This is because there are k means applied to the statistic so there is a reduction by k degrees of freedom to the statistic The Wikipaedia definition for degree of freedom can be found here. They can only do a better job than I have done here. Anyway let's forge ahead... We now calculate the Mean Square for Residuals and the Mean Square for Estimates.
Which can be used to generate the random variable F.
Where...
The distribution of F under H0 is the F distribution with k-1 and n-k degrees of freedom. This is often called the F statistic. The F statistic is the one used to identify the likelihood for or against the Null Hypothesis. With all likelihood you will need a computer program to calculate the value of F. You can find some table for the F distribution here
Let's work through a simple example X,Y and Z are our treatments. They could be anything; fish breeds, training shoe manufacturer, etc. Similarly the data points could be anything. They are just selected to make the average value convenient and the variances comparable.
The mean values are:
Other key figures are
For RSS 'doing the math' would look something like this:
and...
So our residual sum of squares (RSS) value is 42.375 Next in line is ESS. Using the average for the whole set:
With the average for the whole set using the definition of ESS Here is what it looks like for our example:
There are a couple of ways of calculating TSS Now we need to go to each of X, Y and Z values and find the difference between the value and the overall mean and then square that and add up all the individual numbers.
which comes to:
Now the value of TSS is the total variation across the whole dataset.
|
||||||||||||||||||||||||