A goodness of fit test is a statistical method used to determine if the observed frequencies in a dataset align with the expected frequencies based on a specific distribution. This test is particularly useful when assessing whether a die is fair, for example, by comparing the actual results of rolling a die multiple times against the theoretical expectation of a uniform distribution.
In a typical scenario, you might roll a six-sided die 60 times and record the observed frequencies of each outcome (1 through 6). The null hypothesis (H0) posits that the observed frequencies match the expected frequencies, which, under the assumption of a fair die, would be 10 for each outcome (since 60 rolls divided by 6 outcomes equals 10). The alternative hypothesis (Ha) suggests that at least one of the observed frequencies differs from the expected frequencies.
The test statistic for a goodness of fit test is calculated using the chi-squared statistic, represented as:
\[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]
where \(O_i\) represents the observed frequency for each category, and \(E_i\) is the expected frequency. This formula quantifies the discrepancy between what was observed and what was expected. For instance, if the observed frequency of rolling a 1 is 13, the calculation for that category would be:
\[\frac{(13 - 10)^2}{10} = \frac{9}{10} = 0.9\]
After calculating the chi-squared value for all categories, you sum these values to obtain the overall chi-squared statistic. In our example, this might yield a chi-squared value of 11.2.
To interpret the chi-squared statistic, you also need to determine the degrees of freedom, calculated as \(k - 1\), where \(k\) is the number of categories. In this case, with 6 categories, the degrees of freedom would be 5. Using statistical tables or software, you can find the p-value associated with the chi-squared statistic. For a chi-squared value of 11.2 and 5 degrees of freedom, the p-value might be approximately 0.0476.
Finally, you compare the p-value to your significance level (α), which is often set at 0.05. If the p-value is less than α, you reject the null hypothesis. In this case, since 0.0476 is less than 0.05, you would reject the null hypothesis, concluding that the observed frequencies do not match the expected frequencies, indicating that the die is likely not fair.
When conducting a goodness of fit test, ensure that the sample is random, that there are observed frequencies for all categories, and that the expected frequencies are sufficiently large (typically at least 5) to validate the test's assumptions. This method provides a robust framework for assessing the fit of observed data to theoretical distributions.