Standardized Tests
Standardized tests are administered in order to measure the aptitude or achievement of the people tested. A distribution of scores for all test takers allows individual test takers to see where their scores rank among others. Well-known examples of standardized tests include "IQ" (Intelligence Quota) tests, the PSAT (Preliminary Scholastic Achievement Test) and SAT (Scholastic Achievement Test) tests taken by high school students, the GRE (Graduate Requirements Examination) test taken by college students applying to graduate school, and the various admission tests required for business, law, and medical schools.
The "Normal" Curve
The mathematics behind the distribution of scores on standardized tests comes from the fields of probability theory and mathematical statistics. A cornerstone of this mathematical theory is the "Central Limit Theorem," which states that for large samples of observations (or scores in the case of standardized tests), the distribution of the observations will follow the bell-shaped normal probability curve illustrated below. This means that most of the observations will cluster symmetrically around the mean or average value of all the observations, with fewer observations farther away from the mean value.
One measure of the spread or dispersion of the observations is called the standard deviation. According to statistical theory illustrated above, about 68 percent of all observations will lie within plus or minus one standard deviation of the mean; 95 percent will lie within plus or minus two standard deviations of the mean (see graph below); and 99.7 percent will lie within plus or minus three standard deviations of the mean. Standardized test scores are examples of observations that have this property.

Consider, for example, a standardized test for which the mean score is 500 and the standard deviation is 100. This means that about 68 percent of all test takers will have scores that fall between 400 and 600; 95 percent will have scores between 300 and 700; and virtually all of the scores will fall between 200 and 800. In fact, many standardized tests, including the PSAT and SAT, have just such a scale on which 200 and 800 are the minimum and maximum scores, respectively, that will be given.
Scaled Scores
The "standardized" in standardized tests means that similar scores must represent the same level of performance from year to year. Statisticians and test creators work together to ensure that, for example, if a student scores 650 on one version of the SAT as a junior and 700 on a different version as a senior, that this truly represents a gain in achievement rather than one version of the test being more difficult than the other.
By "embedding" some questions that are identical in all versions of a test and analyzing the performance of each group on those common questions, test creators can ensure a level of standardization. If one group scores significantly lower on the common questions, this is interpreted to mean that the lower scoring group is not as strong as the higher scoring group.
If group A scores higher than group B on questions identical to both their tests but then scores the same or lower than group B on the complete test, it would be assumed that the test given to group A was more difficult than that given to group B. Statisticians can develop a mathematical formula that will correct for such a variance in the difficulty of tests.
Such a formula would be applied to the "raw" scores of the test takers in order to obtain "scaled" scores for both groups. These scaled scores could then be compared. A scaled score of 580 on version A means the same thing as a scaled score of 580 on version B, even though the raw scores may be different. In this sense the scores are said to have been "standardized."
Statistical Scores
A second meaning of "standardized" is more subtle, more mathematically involved, and not well understood by the general public. This meaning has to do with the bell-shaped normal probability curve mentioned at the beginning of this article. Theoretically, there are an infinite number of normal curves—one for each different set of observations that might be made. Mathematicians would say that there is an entire "family" of normal curves, and, the members of the normal curve family share similarities as well as differences.
All bell-shaped curves are high in the middle and slope down to long "tails" to the right and left. Although different types of observations will have different mean values, those mean values will always occur at the middle of the distributions. They may also have different standard deviations as discussed earlier, but the percentage of values lying between plus or minus one of those standard deviations will still be about 68 percent, the percentage of values lying between plus or minus two standard deviations will still be about 95 percent, and so on.
In order to make the analysis of normal distributions simpler, statisticians have agreed upon one particular normal curve that will represent all the rest. This special normal curve has a mean of 0 and a standard deviation of 1 and is called the "standard normal curve." A "standardized" test result, therefore, is one based on the use of a standard normal curve as its reference.
The advantage of having the standard normal curve represent all the other normal curves is that statisticians can then construct a single table of probabilities that can be applied to all normal distributions. This can be done by "mapping" those distributions onto the standard normal curve and making use of its probability table. The term "mapping" in mathematics refers to the transformation of one set of values to a different set of values.
To illustrate, consider the test with a mean of 500 and a standard deviation of 100. The mean of this set of scores lies 500 units to the right of the standard normal distribution's mean of 0. So to "map" the mean of the test scores onto the standard normal mean, 500 is subtracted from all the test scores. Now there is a new distribution with the correct mean but the wrong standard deviation.
To correct this, all of the scores in the new distribution are divided by 100, since
, which is the standard deviation of the standard normal distribution. The two distributions are now identical. In mathematical terms the test scores have been "mapped" onto the standard normal values.
This mapping is composed of two transformations: a translation of 500 to the left and a scale change of 1/100. This composition can be represented by
, where x is any test score.
Building on this example, suppose one wants to know the percentage of test takers who scored 650 or above. First, compute
. Then go to a standard normal table, look up a standard score of 1.5, and see that about 6.88 percent of standard normal scores are at 1.5 or above. This means that about 6.88 percent of the test scores are 650 or higher. This procedure may be used with any normally distributed data set for which the mean and standard deviation are known.
Central Tendency, Measures Of; Mapping, Mathematical; Statistical Analysis; Transformations.
Bibliography
Angoff, William. "Calibrating College Board Scores." In Statistics: A Guide to the Unknown, ed. Judith Tanur, Frederick Mosteller, William H. Kruskal, Richard F. Link, Richard S. Pieters, and Gerald R. Rising. San Francisco: Holden-Day, Inc.,1978.
Blakeslee, David W., and William Chin. Introductory Statistics and Probability. Boston: Houghton Mifflin Company, 1975.
Narins, Brigham, ed. World of Mathematics. Detroit: Gale Group, 2001.
This is the complete article, containing 1,232 words
(approx. 4 pages at 300 words per page).