BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help

Search "Chi Square Distribution"

Contents Navigation
Not What You Meant?  There are 28 definitions for Chi.  Also try: X2.

Chi Square Distribution

Print-Friendly  Order the PDF version  Order the RTF version
About 3 pages (922 words)
Chi-square distribution Summary

Bookmark and Share

Chi Square Distribution

The chi square distribution is a probability density function used in statistics to determine the probability that numerical differences in data are significant and real (as opposed to just being due to random errors or chance). Chi is a Greek letter designated by x. In practice, the x2 distribution is expressed as a family of curves or a table of critical x2 values that correspond to a particular probability level. Tables of these "expected" or "theoretical" x2 values are published in reference books. During a x2 test, an "observed" x2 value is calculated for the data in question and compared to the table values to determine the probability that the numerical event in question is statistically significant.

The chi square distribution is closely related to another commonly used probability density function, the normal distribution, which shows the distribution of many random variables. Normal or gaussian curves are bell-shaped curves that are symmetrical about the arithmetic mean value and have two tails. The numerical values (e.g., measurements) are along the x-axis. The relative frequency of their occurrence is along the y-axis. Most of the values fall within the wide part of the curve near the mean. The frequency drops as values begin to deviate positively or negatively from the mean. Very high and very low values have a very low frequency of occurrence, and fall along the lower and upper tails of the normal curve, respectively.

Chi square distribution curves look much different, because they show the distribution of the variance of the data, not the data itself. Variance is calculated mathematically by squaring the standard deviation. Thus, x2 values can not be negative. The chi square distribution is a family of curves, because the curve shape varies depending on the number of degrees of freedom (d.f.) associated with the data in question. Degrees of freedom is an important concept in statistics and can be difficult to understand. Basically, it is a limit of the arbitrariness of a data set. For example, if three numbers sum to 50, defining two of these numbers limits the value of the other number. Thus, there are only two degree of freedom in this equation.

The chi square test can be used to test the homogeneity (or uniformity) within a data set or to determine the probability that there is a dependency relationship between two or more distinct data sets (e.g., in industry, the chi square test could be used to determine the probability that a particular machine in a group breaks down too often or that production increases when a raw material is changed).

A goodness-of-fit test based on x2 values determines the probability that a data set fits a defined pattern. For example, consider a large data set with a known frequency distribution. A small subset of it will have a different frequency distribution than the parent group. As the subset size increases, its frequency distribution more closely resembles that of the parent group. The chi square test can be used to determine what sample size will provide a reasonable approximation of the larger set.

The first step in any statistical test is to establish two hypotheses (educated guesses) about the data in question that can be accepted or rejected. The first hypothesis is called the null hypothesis and is designated H0. For a goodness-of-fit test, the null hypothesis is that the data in question follow a particular pattern (e.g., H0: The data set is uniform). The second hypothesis is called the alternative hypothesis and is designated by H1. It is the opposite of the null hypothesis (e.g., H1: The data set is not uniform).

Consider a simple example for testing data homogeneity. A die is rolled 120 times, and the number of occurrences of the values 1 through 6 is x1=17, x2=19, x3=16, x4=18, x5=16, and x6=34. Is this die equally balanced? The hypotheses are as follows:

  • H0: This is a uniform data set. Thus, variations are explained by chance and coincidence.
  • H1: This data set is not uniform, and variations are statistically significant.

The x2 value is calculated from: x2=((Fo-Fe)2/Fe) where Fo is the observed frequency and Fe is the expected frequency. The expected frequency for a uniform distribution is the mean (usually called xbar) of the data set or 120/6=20. Each value would have been expected to occur 20 times during 120 rolls. (Fo-Fe)2 is the sum of the squares of the deviations from the mean or . Therefore, x2=242/20=12.1. The degrees of freedom is one less than the number of categories or 6-1=5.

Compare this to a table of critical x2 values printed below. For a given d.f., the table provides the critical x2 value associated with a given probability level, i.e., the probability of obtaining that chi square value simply by chance. These probability levels are often called values, referring to the area under the upper tail of the corresponding curve. In general, a probability level of 0.05 is considered the threshold of significance in the scientific community. If the probability level associated with a calculated x2 value is less than or equal to 0.05, then the variance in the data is too great to occur by chance alone, and H0 can be rejected with a high degree of confidence.

From the chi square table, the probability of obtaining x2 greater than or equal to 12.1 (d.f.=5) just due to chance, if the null hypothesis is true, is less than 0.05. Therefore, the null hypothesis can be rejected with confidence. The variations in this data set are statistically significant. In other words, there is something "funny" about this die.

This is the complete article, containing 922 words (approx. 3 pages at 300 words per page).

More Information
  • View Chi Square Distribution Study Pack
  • 28 Alternative Definitions
  • Search Results for "Chi Square Distribution"
  • Add This to Your Bibliography
  • More Products on This Subject
    Chi-Square Model
    A chi-square model is a statistical method used to analyze the results of certain types of experime... more

    Chi-Squared Distribution
    The distribution of chi-squared statistics where chi is the sum of the squares of the deviations of... more


     
    Copyrights
    Chi Square Distribution from World of Mathematics. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.

    Join BookRagslearn moreJoin BookRags




    About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy