Being of a useful nature...

Here are the results of a made up trial for a fantasy drug:

1,031 very ill patients were in the trial. 534 were randomly assigned to the treatment group (an active drug in a red pill tasting vaguely of asparagus) and 497 very ill people were randomly assigned to the placebo group (a non pharmacologically active red pill tasting vaguely of asparagus) The status of each patient was tested at the end of 2 weeks using standard methods to test for life: breath clouding mirror, fondness for chocolate, hatred for Tony Blair, etc. The result can be represented in the following table.

 
Treatment
Placebo
Total
Alive
503
432
935
Dead
31
65
96
 
534
497
1,031

So is the treatment having a statistically significant effect on the likelihood that a very ill patient will still be capable of hating Tony Blair at the end of the 2 weeks? Is it more than just a random variation?

We need to work out the 'expected' value of each group (treatment or placebo) under the assumption that there is no effect of the treatment. The way this is done is by looking at the overall figures as being the sample from the null hypothesis distribution. being in the treatment group is exactly the same as being in the placebo group. i.e. 96/1031 = 9.31% so you might expect 9.31% of the treatment group (totalling 534) to die and so on. Lets see a table of expected values.

Expected

Treatment
Placebo
Total
Alive
484.2774
450.7226
935
Dead
49.7226
46.2774
96
 
534
497
1,031

So how far are the actual values from the expected values?

 

Treatment
Placebo
Total
Alive
484.2774 - 503 = - 18.7226
450.7226 - 432 = 18.7226
935
Dead
49.7226 - 31 = 18.7226
46.2774 - 65 = - 18.7226
96
 
534
497
1,031

Quite a little bit. Notice that all the values are the same apart from the sign. Now we square each figure and divide each by it's expected value. The square of 18.7226 is the same as for the negative number and is 350.5358

We now work out the statistic. This means dividing the difference by the expected value for each cell in the 2x2 table...

Chi-table

Treatment
Placebo
Alive
Dead

Which is a bit like "how far away from the expected value is it per given the size of the expected value"

...and summing these values gives the actual chi-statistic for the table.

This figure is the important bit. We compare this with a chi-squared table with 1 degree of freedom. If you look at the plot below. The pink dotted line represents the chi squared values for 1 degree of freedom. What percentage of the area under the graph is 16.1260 or above? (this is the p value) well I can tell you it's not much.

The p value for the above situation is <0.00001 which means that the probability of you being right if you stand up in front of a lot of people and say "The treatment does not have an affect on the likelihood of being alive at 2 weeks after taking this drug" is less than 0.001%, or if you stand up and say "This drug reduces the chance of death" then the probability that you are wrong is <0.001%

So it's obvious that the actual statement would be better expressed as "The drug has a statistically significant effect on the survival rate of very ill patients after 2 weeks when compared with placebo"

If you use a larger table (n rows by m columns) then you perform the same calculation method but when looking at the Chi squared table you should use (m-1)(n-1) degrees of freedom. is 4x3 table requires that you use the chi squared table calculated for 6 degrees of freedom.

Some technical details:

The chi-squared distribution is obtained by sampling a normal distribution with mean=0 and variance=1 repeatedly. Squaring the values of individual samples and then adding them up.

The parameter for the chi-squared distribution is the degrees of freedom (for example k) if you have k = 2 then you take two samples from the distribution, square them and add up the result. If you do this a lot then the distribution you can build up with a histogram of these figures makes the chi-squared distribution with degree of freedom k.

If we look at a low value of k then clearly you may not have a very high average value for your W distribution however as k increases so does your average value for the distribution..

The chi squared distribution has a p.d.f that uses the Gamma function and looks like this:

Plotting the functions obtained by using values of 10, 20, 5 and 1 for k give the following graphs.

...as k tends to infinity the chart becomes more normal. Also the mean value of the chi-squared distribution is k and the variance is 2k.