THE TEACHER'S STATISTICAL PACKAGE

                       by George W. Goth

     These programs are shareware.  You may use them as much
as you like for as long as you like.  You may also give as
many copies as you like to any friends, post them on
bulletin boards, use them as party favors and so forth. If
you find them useful, would you eventually please send
$10.00 to

               George Goth
               Skyline College
               3300 College Drive
               San Bruno, CA 94066

     Those contributing their share will become registered users
and will receive updates when they will become available.

                  - - - - - - - - - - -

                         GETTING STARTED

     It is STRONGLY recommended that you make a back-up copy
of the disk and put the original in a safe place.  The disk
is not copy protected. Be sure to copy BOTH sides of the
disk.

     Once you have made the backup copy, boot it and in a
few moments the monitor screen will display the following:

                     THE TEACHER'S STATISTICAL
                              PACKAGE

                    WOULD YOU LIKE TO CALCULATE:

                  M: MEANS AND STANDARD DEVIATIONS

                  T: T-TEST COMPARISON OF MEANS

                  A: ANALYSIS OF VARIANCE

                  C: CHI-SQUARED ANALYSIS

                  L: LEAST SQUARE FITS

                  R: MULTIPLE REGRESSIONS

                  S: SEE MENU OF MORE PROGRAMS

                  Q: QUIT THE PROGRAMS

                   PLEASE ENTER A LETTER < >

This is called the PROGRAM MENU and you use it to select the
statistical programs. At the bottom of the screen
you see the phrase "PLEASE ENTER A LETTER < >" with a
flashing box, called the cursor, inside the brackets.  You
select the program you wish to run by typing the letter
corresponding to the program. For example, typing the letter
M will run the program that calculates means and standard
deviations and so forth.  You need not type a carriage
return after typing the letter.

[A NOTE ON ENTERING INFORMATION INTO THE COMPUTER: To use
these programs you have to enter two types of information,
choices and data.  CHOICES are just that; you are given
a list of options and you choose one. Whenever you must make
a choice the available choices are either indicated by a
list (or menu) or will be a "Y" (for yes) or "N" (for no) to
a question (such as "SEND RESULTS TO PRINTER? < >").
Choices are made with single keystrokes with no carriage
return. DATA consists of the actual numerical results you
want to analyze and the names you give the data. To enter
the data, type in the correct number or name and then type a
carriage return. If you have any qualms about whether
to type a carriage return, enter the information and
wait a second - if the computer does not immediately do
something, it means it is waiting for a carriage return].

                MEAN AND STANDARD DEVIATION

     Type a capital letter M to select the Mean and Standard
Deviation Program and we will begin our tour of statistics.

     If you acquire data, such as by making measurements of
something or grading a group of tests, you will almost never
find that all the measurements (or tests) have the same
value.  Rather the values range from a minimum to a maximum,
with perhaps a clustering toward the middle.  What we would
want to know about this data are:

     a) what is the best representation of its average
        value; and

     b) what is the best representation of its variability
       (i. e. are the values tightly clustered together or
       spread out).

     There are several ways of expressing the average,
the three best known being:

     a) the mean (also called the arithmetic mean), which is the
     sum of all the values divided by the number of values.
     Thus the mean of 1, 2, 3 and 3 is

                        1 + 2 + 3 + 3             _
            mean =      -------------   = 2.25  = x
                              4

     The mean is the most useful of the averages. If the
data collected was from the entire population, the mean is
called the population mean; if the data represented only a
portion of the population, the mean is the sample mean. We
will generally be working with sample means, often
designated by an x with a line over it (called x bar).

     b) the median, which is middle value, that is half the
     other values are smaller than this and half are larger.
     For our data from a) the median is 2.5.

     c) the mode (from French for 'fashionable'), which is
     the value that occurs most often.  For our data, the
     mode is 3, as there are two 3's. If more than two
     numbers occur with the same maximum (as in the case
     1,2,3,3,4,5,5), the distribution is bimodal.

     There are also several measures of the variability
(or dispersion) of the data. The three best known are:

     a) the range, the difference between the maximum and
     minimum values.  For our example from above, the range
     would be 3-1=2. The range is not a particularly useful
     measure of dispersion. For instance, you could have a
     thousand 3's and a single value of 53, and the range
     would be 50.  The range is also 50 for a single 3 and a
     single 53.

     b) the mean deviation. If you were to subtract the mean
     value from each of the data points (to calculate the
     deviation of each point) and sum the result, you would
     get zero.

              1 - 2.25  = -1.25
              2 - 2.25  = - .25
              3 - 2.25  =   .75
              3 - 2.25  =   .75

                   sum        0

     The mean deviation is the sum of the absolute values
     of the deviations, divided by the number of values. In
     this case it is (1.25 + .25 +.75 + .75)/4 = .75. The
     mean deviation tends to underestimate the variability
     in many cases.

     c) the variance is the sum of the squares of the
     individual deviations divided by the number of
     observations less 1. The positive square root of the
     variance is called the standard deviation, s.  For our
     sample the standard deviation is


               _______________________________________
              /         2         2        2        2
   s = _     /   (-1.25)  + (-.25)  + (.75)  + (.75)    =.96
        \   /    ------------------------------------
         \ /                4 - 1


     The standard deviation is the most useful of the
     measures of dispersion, as we will see later.

     With those definitions in mind, let's use the program
to calculate some results.  When you pressed the letter M,
after a few moments you saw the following on the monitor
screen:

                    MEAN AND STANDARD DEVIATION

               (C) 1986 GEORGE GOTH          V 1.00

                         I: INPUT MODULE

                         E: EDIT MODULE

                         C: CALCULATE MODULE

                         D: DISK MODULE

                         Q: QUIT

                     PLEASE ENTER A CHOICE < >

     All the programs are constructed around modules
which allow you to enter data (INPUT), change data (EDIT),
calculate results (CALCULATE) or store it on disk (DISK).
Since no data has been entered yet, let's do that by typing
I (no carriage return) to get to the INPUT MODULE.

     Once in the Input Module you are asked for a NAME OF DATA SET.  
This is a tag or ID of the data.  If you don't have a name, press 
return.  However, let's assume that you want to know the averages of 
the ages of a group of mathematics students.  So name the Data set 
MATH STUDENT AGES. Now you see a line saying "Point #1" with a 
flashing cursor after it. You are being asked to enter the age of the 
first student. Suppose the ages were

       18, 17, 19, 19, 22, 23, 23, 18, 20, 22

Enter each of these numbers (followed by a carriage return)
and write the word END to end the data entry and return to the Main 
Menu.

     Suppose you think you made an error in the data entry.
Now choose E from the Main Menu to go to the EDIT MODULE. If
you do this you will see a new screen and across the bottom
are the words:

            <V> View Data         <O> Order Data
            <C> Change Data       <M> Main Menu
            <D> Delete Data

     The View Data Option (V) allows you to look at the data
to see if you made a mistake.  You will be asked if you want
to send the data to the printer for a hard-copy (Y for yes,
N for no). If there are more than 10 data points, the
program will pause at every 10th value. The points are
numbered sequentially (the first age of 18 is given the
number 1, etc).

     The Change Data Option (C) allows you to change either
the Data Set Name or a Value of a point by selecting either
D or V. If you wish to change a value, enter the number of
the point you wish to change, the value will be displayed
and either type in a new value (to correct it) or press
return (to keep it).

     The Delete Data Option (D) allows you to remove a value
from the data. If you wish to do this, type the number of
the point and, after its value is displayed, type either a Y
for yes or a N for no to delete or keep the point.  If a
point at the beginning or in the middle of the data is
deleted, all the others are renumbered. Thus, if you deleted
the first 18, the value of point 1 would become 17.

     The Order Data Option (O) allows you to arrange the
value from lowest to greatest value. This is useful if you
want to draw a graph of the data later.

     The Main Menu Option (M) returns you to the Main Menu
when you have completed your editing.

     After editing the data, return to the Main Menu and
Choose C for the Calculation Module.  Before the calculation
is done, you will be asked if you want to send the results
to the printer (answer Y or N) and if you want to calculate
the median and mode [1]. Let's calculate the median and mode
in this case.

     In a few moments, the following appears

        NAME OF DATA SET:  MATH STUDENT AGES

        NUMBER OF DATA POINTS: 10
        MEDIAN:                19.5
        MODE:                  23
        MEAN:                  20.1
        STANDARD DEVIATION:    2.23358208
        ------------------------------------
        <S>AVE MEAN AND STD. DEV. <Z> SCORES
                    <M>AIN MENU
            PLEASE ENTER A CHOICE < >

     The Save Mean and Standard Deviation Option (S) allows
you to save the mean and standard deviation [2] in memory
for use later.  When you press S, the computer will beep,
indicating the data has been saved.

     The Z Scores Option (Z) will display a set of
statistics called, most appropriately, the z scores.
A z score for a point is:

                 value of point - mean
           z =   ---------------------
                  standard deviation

Thus the z score for age of 18 in this collection of ages would be:

                 18 - 20.1
          z  = ------------- = -.9401
                2.23358208

The closer the size of the z score is to zero, the closer
the point is to the mean value.  Z scores are discussed more
fully below.

     If you have not already done so, save these results
using the S option and return to the Main Menu.

     Choose the Input option.  Notice that you are warned
that data are already in memory.  Since you are about to
calculate another mean, clear it. Now do the following problem.

Problem: A group of 12 students in an English class have the
following ages: 22, 23, 21, 19, 18, 22, 20, 18, 23, 24, 18,
23. Calculate the mean and standard deviation of the ages of
this group.  Save the results when you are done and return
to the Main Menu.

[Answer: mean is 20.9166667, standard deviation is 2.23437335.]

     You are now ready to explore the wonders of the Disk
Option (D). When you press D, you see the following:

                   S: Save to Disk
                   F: Fetch from Disk
                   C: Catalog Disk

                   L: Lock a File
                   U: Unlock a File
                   R: Rename a File
                   D: Delete a File

                   M: Main Menu

     The Save to Disk Option (S) allows you to save data on
the floppy disk. The disk must have sufficient room on it for the data 
to be stored, so you should have a Prodos formatted disk named 
STAT.PRODOS available.  You will be asked whether you wish to save Raw 
Data (the data values only) or a Set of Means (the means and standard 
deviations of the data sets you have stored in memory, along with 
their data set names).  Choose S to save the set of means, for we will 
be using them later, and give the set an appropriate name [3], such as 
STUDENT AGES. 

     The Fetch from Disk Option (F) allows you to recover
raw, unanalyzed data from the disk.

     The Catalog Option (C) allows you to examine the
contents of the disk to see what programs and files are
stored on it.

     The Lock a File Option (L) allows you to "lock" a file,
that is prevent another file with an identical name from
replacing it.

     The Unlock a File Option (U) unlocks the file, so that
another file with the same name can replace it or so that it
can be deleted.

     The Rename a File Option (R) allows you to give the
file a new name.

     The Delete a File Option (D) permanently removes the
file from the disk.

     When done, return to the Main Menu, choose Q (for
Quit), which will get you back to the program menu.  You can
either take a break now or go onto the next section.

FOOTNOTES ON MEAN

[1] If there are several modes, the program will find only
the one with the largest value. Median and mode calculations
involve ordering and ranking the data, and can take a
considerable time if there is a large number of data points.

[2] The program will also store the sum of the squares of
the data. This information is used in the ANOVA program.

[3] See list of file suffixes at the end of this manual.

               WHAT ARE WE GOING TO DO WITH THE MEAN?

     Now that we know how to calculate the mean and standard
deviation, what are we going to do with it? In the next several 
programs we will compare one mean with another (or others) to see if 
they could be representatives of the same population. Generally the 
two (or more) samples of different populations will not have exactly 
the same value for their means, but because the data are dispersed (as 
measured by the standard deviation), it could well be possible that 
the populations do have the same mean.  So, we wish to develop a 
series of tests, using the samples drawn from the populations, for 
determining if the two (or more) populations have identical means.  
The method of doing this is called hypothesis testing.

     A hypothesis is a statement or claim about the nature
of the populations.  We will focus on what is called the
null hypothesis (designated H ) and develop tests for
                             0
accepting or rejecting this null hypothesis as true or
false.

     As an example, suppose we wish to see if a coin is
'fair', that is if we tossed it a very large number of times
it would come up heads half the time and tails the other
half. But we don't have the time to flip it a very large
number of times, we only can flip it, say, 50 times. We do
so and it comes up heads 26 times and tails 24 times.  Is it
truly a fair coin? Or, to put it into statistical jargon is

     H : p(heads) = 0.5
      0

to be accepted or rejected? (The symbols mean "the null
hypothesis is that the coin will come up heads 50 per cent of the
time.)

     If we can reject the null hypothesis, then we accept
what is called the alternative hypothesis, which in our case
would be that the coin does not come up heads 50 per cent of the
time.

     It is obvious from our results (26 heads, 24 tails)
that it did not come up heads exactly 50 per cent of the time, but
it would not be unreasonable to say that even if we did have
a strictly fair coin and flipped it 50 times over and over,
some of those times we would get 26 heads and 24 tails. If
this result occurred fairly often (say 5 times out of 100),
then we could say that we were certain that our coin was
fair to the 5 per cent level of significance.  If a coin came
up with 26 heads and 24 tails only very occasionally (say 1
time out of 1,000,000) we would be very doubtful that our
coin was truly a fair one (there would only be a one in a
million chance that it was), in other words we could reject
it at the 5 per cent level of significance.

     We do not have the space here to discuss, much less to
develop, the theories behind the procedures that
are used to test hypotheses.  As the programs were designed
to carry out the calculations and state the results,
we will present merely a brief discussion of conditions
under which the tests are used. However, we must begin with
a short discussion of the term "normal distribution."

     For many phenomena, the data are arranged in what is
popularly called a 'bell shaped curve.' The curve is more
properly called a normal curve, or a Gaussian curve, and it
is determined by the value of its mean and its
standard deviation; that is, if you know these two values,
you can calculate the value of the normal distribution at
any point. The mean determines where the central maximum
lies and the standard deviation determines the 'flatness' of
the curve; that is, if the standard deviation is small, the
normal distribution becomes spike-like, if the standard
deviation is large, the normal distribution becomes
flattened.  If a set of data are normally distributed, then
the probability that a value x in the data set will be found
is directly related to the value of the normal curve for x.
For instance, if the mean of a set of normally distributed
data is 10 and the standard deviation is 2, then the
probability of finding a data point of value 14 is 0.0228
(2.28 per cent).  For the same data, the probability of finding a
data point of value 16 is 0.0013 (0.13 per cent). We will be
assuming for the first several programs that the data are
normally distributed.

     As for the tests we use to test the hypothesis, they
are

     a) z scores. As mentioned above, a z score is given by

                    value of point - mean
              z =   ---------------------
                     standard deviation

     Therefore a point far from the mean (assuming the
     standard deviation is small), will have a large z
     score.  Since large z scores imply small
     probabilities, the likelihood of finding such a point
     will be small. For instance, the probability of finding
     a point with a z of 3.09 is only 0.1 per cent.

     For instance, it can be shown that if you flip a fair
     coin 500 times, if will come up heads with a mean of
     250 times and a standard deviation of 11.  If we flip a
     coin 500 times and it comes up heads 280 times, is it a
     fair coin?  First we calculate the z score

                      280 - 250
              z  =    ---------   =   2.73
                         11

     If we look up in the appropriate table (found in any
     elementary statistics book), we will find that this
     value of z corresponds to a probability of 0.0032
     (0.32 per cent). That is, if we were to flip a truly fair coin
     for 10,000 cases of 500 times each (a total of 5
     million flips!), only in about 3 cases would we have
     come up with 280 (or more) heads. Thus it would be
     reasonable to reject the null hypothesis that our coin
     (the one with a z of 2.73) was a fair one.

     The z scores are used to test the null hypothesis in
     the MANN-WHITNEY U, SIGN, RUNS and RANK CORRELATION
     programs.

     b) Student's t. When we have two normally distributed
     sets of data, with known means and standard deviations,
     we can test to see if the means are the same (i.e the
     null hypothesis is that the means are the same) at a
     given level of significance by computing a particular
     statistic called t and checking whether the
     value of t occurs with the level of significance (or
     better) we desire. This method of comparing
     means was developed by William Gosset, published his 
     results under the name Student. The particular value of 
     t will also depend on what is called the 'degrees of 
     freedom' (df), which is  related to the sample size.[1]

     The t test is used in the programs T-TEST COMPARISON OF
     MEANS and ANOVA (what is termed a 'multiplet t test' is
     used in the second).

     c) F test (named after Sir Ronald Fisher) is used when
     three or more sets of data are to be compared to see if
     all the means are identical. It is used in the program
     ANOVA (which stands for analysis of variance).

     c) chi squared test. The chi squared test is used in
     several statistical applications.  In these
     programs the chi squared test is used to test
     hypotheses for data which may not be normally
     distributed and, in fact, may not be ordinal
     (that is, have true numerical values).  For instance, we
     assign in our classes the grades A, B, C and so forth.
     These letters are not ordinal (they are what is termed
     nominal, that is used in naming classes and
     could be replaced by 1, 2, 3 etc) much less normally
     distributed and yet it is still possible to do
     statistics on such data sets.

     With all that in mind, let's check some hypotheses.

FOOTNOTE

[1] If the degree of freedom is 30 or more, the value of t
is approximately equal to the value of z.

                   T-TEST COMPARISON OF MEANS

     The first statistical test we will carry out is
Student's t test for comparing means. Boot the disk (if it
is not on already) and from the program menu choose T to
select T-TEST COMPARISON OF MEANS.  Shortly you will see a
menu very much like that used in the MEANS programs (all but
one of these programs have this as their main menu).  Since
we have already calculated some means and stored the result
as the file AGES, let's analyze that.

     Type D to get to the disk menu and then F to fetch a
file.  Type the word AGES to load that file and then M to
return to the main menu.  Then type C for calculation and in
a few moments you will see the following result:

           NAME OF DATA SET: AGES

           NUMBER OF VALUES: 2
            #        NAME     NO  MEAN      STD. DEV.
            1     ENGLISH     12  2.09E+01  2.23E+00
            2     MATH        10  2.01E+01  2.23E+00

           STD. ERROR OF DIFFERENCE:   1.00E+00

           T-TEST:   8.14E-01   DEG. OF FREE.: 20

            NULL HYPOTHESIS: MEANS ARE THE SAME

           ACCEPT NULL HYPOTHS AT 5.0% LEVEL OF SIG

     Notice that strange looking numbers (like
2.09E+01) occur.  This is because some of the numbers are
expressed in exponential (or scientific) notation.  This
allows the program to fit numbers of various lengths
(example 2.09, .0000289) in the slots of the same size on
the screen and makes for a pleasant looking format.  In case
you are not familiar with this notation, the 'E' means
'raised to the power of 10', so:
                                              1
          2.09E+01   is the same as  2.09 x 10   = 20.9
                                              0
          2.23E+00   is the same as  2.23 x 10   = 2.23
                                              -1
          8.14E-01   is the same as  8.14 x 10   = 0.814

The program prints out the set names, means, standard
deviations for each of the data sets compared, along with
the standard error of the difference (a statistic used in
the t test), the value of t and the degrees of freedom.  Of
most interest to us are the last two lines, the statement of
the null hypothesis (the means are the same) and the
conclusion, which in this case is that we should accept the
null hypothesis at the 5.0 per cent level of significance.

     Note that if the null hypothesis is accepted at
the 5.0 per cent level of significance, it would also be accepted at
the 10 per cent, 20 per cent, 50 per cent etc levels of significance - 
for any level more than 5.0 per cent.  To find out if it were accepted 
at, say, 2 per cent, you would have to look up the value of t at 2 per 
cent for 20 degrees of freedom and see if it were greater than our
calculated t (0.814), in which case our hypothesis would be
accepted at 10 per cent, or less than 0.814, in which case it would
be rejected.  The program will only accept or reject at
levels of 5.0 per cent, 1.0 per cent and 0.1 per cent, which are the 
most commonly used levels in hypothesis testing with the t test.

     Let's try to add some more data. Return to the main menu by 
pressing M, then get to the Input Module with an I, indicate you do 
not wish to clear the current data with a N and then let's add the 
following data for a sample of ages of telecommunications students : 
Name = TELE COMM, number = 17, mean = 29.6, standard
deviation = 3.8. Then type END to end the input module and
then C to get back to the calculation.

     Now the program will halt and ask you which of the
three means you wish to compare.[If you can't remember which
number corresponds to which mean, return to the main menu
and use the Edit module to view the data.] Let's compare 1
and 3 (that is the ages of English students with those in
telecommunications). So type 1 (return) and 3 (return) and
you will see

           NAME OF DATA SET: AGES

           NUMBER OF VALUES: 3
            #        NAME     NO  MEAN      STD. DEV.
            1     ENGLISH     12  2.09E+01  2.23E+00
            3     TELE COMM   17  2.96E+01  3.80E+00

           STD. ERROR OF DIFFERENCE:   1.27E+00

           T-TEST:   6.58E+00   DEG. OF FREE.: 27

            NULL HYPOTHESIS: MEANS ARE THE SAME

           REJECT NULL HYPOTHS AT 0.1% LEVEL OF SIG

This time we see that we are to reject the null hypothesis
and conclude that there is a difference in the ages of the
two types of students at the 0.1 per cent level of significance.
There is less than a 1 in 1000 chance that the means of the
two ages are the same.

                    ANALYSIS OF VARIANCE

     Suppose you would want to compare more than two means,
how would you do it?  You could, perhaps, to a series of
t-tests between the individual means but there are both
practical and theoretical reasons for not doing this.
Practically, even only a few means to compare would
result in many t-tests (9 different means would result in a total of 36 different tests, which, even with a computer, would take a long time). 

     Also, when doing it by pairs other complications arise, 
particularly the possibility of what is called a Type I error is 
increased [1]. The preferred method of comparing more than two means
is called an Analysis of Variance (ANOVA), which, not surprisingly, 
analyses variance between several means.

     Without going into the details, the ANOVA computes two 
statistics, the Within Groups Sums of Squares (SSE),
a measure of the variability within the samples because of their
standard deviations, and the Between Groups Sums of Squares
(SSB), a measure of the variability between the samples due
to the differences in their means. Appropriate numbers of
degrees of freedom and a statistic F are then computed. The
null hypothesis that all the means are the same is then
rejected or accepted at a level of significance by comparing
the calculated F with appropriate values from a table.  The
levels of significance tested by this program are 5 per cent and 1 per 
cent, the most commonly used ones.

     For instance, suppose we wished to see if school
attendance varied with the day of the week. On 18 randomly
selected days we take attendance and get the following
results:

     Mondays: 143, 128, 110
     Tuesdays: 162, 136, 144, 158
     Wednesdays: 160, 132, 180, 160, 138
     Thursdays: 138, 168, 120
     Fridays: 110, 130, 135

We would first need to calculate the means, standard
deviationsand an additional term called the sum of the
squares, which is done for us by the MEAN program. After
we calculate these values and store them on disk, we can use
the ANOVA program to analyze the results. We get to the
ANOVA program by typing A from the Program Menu.

     After we are in the ANOVA program, we fetch the data on
the means we have stored by using the Disk module and then
go to the Calculate module.  We are asked how many of the
means we wish to compare, and since we want to analyze all
of them, we type 5.  Shortly afterwards, the program
displays:


          NAME OF DATA SET: ATTENDANCE

          NUMBER OF DATA VALUES: 5

           #   NAME          #   NAME
           1   MONDAY        2   TUESDAY
           3   WEDNESDAY     4   THURSDAY
           5   FRIDAY
          SOURCE OF VAR    DF    SS         MS
          BETWEEN GRPS     4     2.52E+03   6.29E+02
          AMONG GRPS       13    4.00E+03   3.08E+02
          TOTAL            17    6.52E+03
                           F  =  2.05E+00

             NULL HYPOTHSIS: ALL MEANS ARE SAME

          ACCEPT NULL HYPOTHSIS AT 5% LEVEL OF SIG

The result, surprisingly, is that the null hypothesis is accepted at 
the 5 per cent level of significance. There is no evidence that the 
attendance varies with the day of the week.

[Problem: For a biology class project a student grows bean
plants utilizing four different fertilizers.  She then
collects and weighs the beans produced by each plant.  From
the data given below, determine if the fertilizers have any
effect on the yield of the beans.  Also determine which, if
any, of the fertilizers, produce significantly different
yields.]

     Fertilizer               Yield (in kilograms)
     Macro-gro                3.7, 4.4, 3.8, 4.2
     Multi-yield              3.6, 3.5, 4.1, 3.9, 3.8
     Poly-bean                6.3, 5.9, 4.8, 6.0, 5.8, 4.9
     Legume-orama             2.9, 3.3, 2.8

[Answer: Using the MEAN program to calculate the means and
other needed statistics and then the ANOVA program to do the
analysis, we conclude that, at the 1 per cent level of significance,
the null hypothesis that the mean yields are the same should
be rejected.  But how do we answer the second part, that is,
how do we determine which, if any, of the fertilizers differ
from each other.  To do this, we use the T-test Option (T)
at the bottom of the Calculation Module screen.  This allows
us to compare any two means using a multiplet t-test.  If
we compare Macro-gro (1) with Multi-yield (2), we see that
at the 5 per cent level of significance, the means are the same.
Similarly, the yields of Macro-gro and Poly-bean (3) are
different at the 0.1 per cent level of significance and so on.]

FOOTNOTES ON ANOVA

[1] A Type I error is when the null hypothesis is true but
is incorrectly rejected.  A Type II error is when the null
hypothesis is false but is incorrectly accepted. Most null
hypothesis testing is based on type I errors.

                THE METHOD OF LEAST SQUARES

     In the section on ANOVA we learned how a group of data
sets could be compared to learn if their means were the same
within a certain error. Thus, for instance, if a student
placed varying amounts of fertilizer on bean plants and
measured the yields, she could determine if all the means
were the same, or if not, which were different.  However, we
can extract more information from this type of experiment.  We can 
learn if the amount of fertilizer and the yield are 'linearly 
correlated'.

     The term linearly correlated means that if the amount
of fertilizer is increased, the yield increases (or perhaps
decreases) by a fixed amount; that is the two variables are
related by a straight line.  A straight line is fully
characterized by its slope and intercept. The slope is the
amount the dependent variable (in this case the yield)
changes when the independent variable (in this case the
amount of fertilizer) and is generally given the letter 'm'.
The intercept is the value of the dependent variable when
the independent has a value of 0 and is most often given the
symbol 'b'.  The independent variable is generally given the
letter 'x' and the dependent the letter 'y'.  The equation
for the straight line is then:

          y = m * x + b

(the * symbol means multiply).

     We could attempt to find the relationship by plotting
the dependent variable (yield) on the vertical axis (the ordinate) and 
the independent variable on the horizontal axis (the abscissa) of a 
graph, drawing a straight line through the data points and then 
extracting the slope and intercept by hand. However, the question 
would then be, "since an infinite number of straight lines can be 
drawn on the graph, how do we know if we have drawn the line that be 
have drawn the one that best represents the correlation?" In other 
words, how do we know we have drawn the 'line of best fit'?

     The line of best fit tells us if the dependent variable
(the yield in this case) is directly effected by the independent 
variable (the amount of fertilizer). The line of best fit is 
calculated by a method called 'least squares' and the least squares 
program on this disk does this and reports the slope and intercept of 
the best fit line.

     A statistic, called Pearson's r, is also calculated.
Pearson's r is a measure of the 'goodness' of the fit, and
it has the following properties:

     a) if r = 1 the data are totally correlated;

     b) if r = 0 then either there is no relationship
     between x and y, or the relationship is not linear;

     c) the closer the value of r is to 1, the more closely
     the calculated values lie to the measured ones. An r of
     .9 indicates a good fit. [2]

     Choose L from the program menu we can use it to analyze
the following data.

     A student puts varying amounts (in grams) of fertilizer
on a series of bean plants.  At a fixed time, she harvests
the beans and measures the yield (in grams).  She obtains
the following data:

fertilizer (grams)     .50  1.0   2.0   3.6   5.0   7.5
yield (grams)        11.8  15.0  14.5  22.0  26.5  30.5

What is the equation for the best line relating the yield
and fertilizer?  What is the value of r?

     We choose the Input module, give the data set the name
BEAN PROJECT, the Independent variable the name FERTILIZER
(GRAMS), the Dependent YIELD (GRAMS) and enter the numerical
data and end with an END. Then from the main menu, we
choose G for graph and then L for a linear fit [3], the
program calculates the results and....

     Lo and behold! A graph! Yes, the program draws a scatter diagram 
of the data and the best straight line. Plus, it gives us at the 
bottom of the screen, the values for the slope, intercept and fit! 

     We can get a hardcopy of these results (the values,
not, alas, the graph) by pressing any key and then typing an
R (for send Results to printer).  This gives us:

          INDEPENDENT VARIABLE (X)     : FERTILIZER (GRAMS)
          DEPENDENT VARIABLE (Y)       : YIELD (GRAMS)

          SLOPE                        : 2.74981414
          ERROR ON SLOPE               : .291349548 [4]
          PERCENT ERROR ON SLOPE       : 10.5%

          INTERCEPT                    : 11.0443587
          ERROR ON INTERCEPT           : 1.18742069
          PERCENT ERROR ON INTERCEPT   : 10.7%

          PEARSON'S R ('FIT')          : .978277057

          TYPE OF FIT                  : LINEAR

     Thus, in this case, the equation for the straight line
relating the yield and fertilizer would be (with rounding)

           yield = (2.75) * (fertilizer) + 11.04

     Since Pearson's r is close to 1, the fit is a good one.

     It is also possible to interpolate a value, that is
determine what the calculated value for y would be for a
given value of x.  For example, if from the Graph menu we
choose I for interpolate, and enter 10 for the abscissa (the
x value), we get a value of the ordinate (y) of 39.0 grams.

[Problem: A group of 19 students in an English class are
given a vocabulary test and a spelling test.  The number of
errors for each are given below.  Are the data correlated?
What is the equation for the line between them? If a student
got 13 errors on the vocabulary test, what would be the
likely value for the number of errors on the spelling test?

Errors on
----------
Vocabulary   14 19 13 15 14 14 17 10 15 12  9 16 11 15 16  8
Spelling      6 11  4  8  6  3 11  3  6  5  4  9  3 10 10  4

Vocabulary   15 17 16
Spelling      9 11  9

[Answer: The data are somewhat correlated.  The points are
scattered around the line and the fit is only 0.843.
The equation would be:

     spelling errors = (.873) * (vocabulary errors) - 5.27

For a student with 13 vocabulary errors, the most likely
number of spelling errors would be 6.]

FOOTNOTES TO THE LEAST SQUARE PROGRAM

[1]  The line of best fit has the following characteristic:
if y' is the value of the dependent variable as calculated
from the equation for the line for each value of the
independent variable, x, and y is the actual measured value
for the independent variable for each value of x, then the
best fit line has the smallest possible value for the term:

             sum over                2
             all the x's     (y' - y)

Since, to find the best line, we want to minimize a term that is a 
square, the procedure is called THE METHOD OF LEAST SQUARES.

[2] Pearson's r has a range of from +1 to -1. A value of +1 means the 
data are totally correlated and that as x increases, y increases.  A 
value of -1 means that the data are totally correlated and that as x 
increases y decreases.  A value of 0 for Pearson's r means that the
data are totally uncorrelated (that is, the value of x has no effect 
on the value of y).  This program reports only the absolute value of 
Pearson's r.  You can tell if it is negative or positive by looking at 
the graph.

[3] This program also allows you to transform the data to
see if the line is linear in some other way. The other fits
that we can do are:
                                                     a * x
     Exponential: log (y) = a * x + b  ; or y = b * e

                                             ((y-b)/a)
     Logarithmic: y = a * log (x) + b  ; or e          = x
                                                     a
     Power: log (y) = a * log (x) + b  ; or y = b * x

These are useful mainly in scientific applications.

[4] The error on the slope is essentially the standard
deviation of the slope parameter. The error on the intercept
is, essentially, the standard deviation of the intercept.

                 MULTIPLE REGRESSION

     What if a variable is a function of two other
variables, can we still determine a relationship between the
three of them? For instance, could we find the relationship
between the yield of the bean plants, the amount of
fertilizer and the amount of water provided.  We can, using
the method of multiple regression.  Multiple regression can
be expanded to find the relationship between any number of
independent variables and a dependent one, but the program
on this disk limits you to two independent variables.

     We get into the multiple regression program by typing
the letter R from the program menu. As an example, suppose
that a mathematics teacher believes that the students'
grades on his first calculus test will be a linear function
of the grades on algebra and geometry placement exams given
at the beginning of the course. Here is the data:

Algebra placement score   14  14   9  10   9  15  19  12  18
Geometry placement score   7   6   4   4   7   8   9   8   7
Calculus exam grade       76  64  42  57  69  82  91  76  84

     When we enter the data and run the calculation we get
following:

             DATA SET NAME: PREDICTION OF CALCULUS
             NO. OF PNTS  : 9
             SYM  VARIABLE
              Y   DEPENDENT  CALCULUS GRADE
              X1  1ST INDEP  ALGEBRA SCORE
              X2  2ND INDEP  GEOMETRY SCORE

              Y = 5.84E+00 * X1 + 1.54E+00 * X2
                           + 1.17E+01

             CORR. COEF. = 9.23E-01

This means that the relationship is

 Calc. grade = 5.84 * (alg scr) + 1.54 * (geo scr) + 11.7

The 'corr. coef.' is the coefficient of correlation, a
statistic similar to Pearson's r, which is a measure of the
fit.  A value of .923 indicates that the fit is fairly good.

[Problem: A physics instructor believes that the location of
a falling object is a function of the time since its release
and the square of the time. He takes the following
measurements:

Time (seconds)                       1     2     3     4
Time squared (seconds squared)       1     4     9    16
Location (meters)                  12.0  14.2   6.6  -10.8

Are the variables correlated? What is the relationship
between them? What is the location of the body at 5 seconds?

[Answer: They are perfectly correlated (this is, in fact,
the data for an object in free fall in a vacuum).  The
equation is

     y = 16.9 * x1  - 4.9 * x2 + 0

At 5 seconds (and 25 seconds squared) the object would be at
-38 meters.

                    THE CHI SQUARE TEST

                         A. ONE WAY

     So far, we have been looking at situations where we have made 
some observations (called the observed, O) and compared it with some 
theoretical or expected values (called the expected, E) to see if the 
difference between the two is a significant deviation (such as a bias) 
or only to do a sampling error (one that results because our sample is 
of finite size).  In all the cases so far, the data has been assumed 
to be normally distributed.  What if it is not normally distributed 
or, horrors, it is by class (A, B, C, etc) instead of numerical. Can 
we still determine if there is a significant difference between the 
observed and the expected?

     Indeed there is.  We use the chi square test. The formula
for the calculation of chi square is:

                                   2
                2           (O - E)
               X   = sum of --------
                               E

The X is the Greek letter chi, which looks very much like an X.  The 
chi square test can be applied to any case where we know the observed 
and expected values, even one where the data are normally distributed.  
We reject or accept our null hypothesis (which is that there is no
difference between the observed and expected) by checking our value of 
chi square in a table of the chi square distribution (for a certain 
number of degrees of freedom). If our value exceeds the value in the 
table, for a specified level of significance, we reject the null 
hypothesis; if the two are equal or ours is lower, we accept the null
hypothesis.

     For instance, suppose that in a certain course, the grade 
distribution had traditionally been:

 Grade                  A     B     C     D     F
 Percent receiving      5    20    50    20     5

But when a particular instructor teaches the course, his
grade distribution is 26 A's, 92 B's, 270 C's, 101 D's and
14 F's.  Is there a significant difference between this
instructor's grade distribution and the traditional one?

     We type C from the program menu to get to the chi
square programs and the O to select the one-way program (one
way is when we have only one column of data, two way is when
we have rows and columns). Giving the data set the name
GRADE DISTRIBUTION and entering the observed data, we are
next asked if the expected values are to be entered as
numbers (that is values for each point), as percents, or to be
assumed equal for all classes. We will choose percents and
then enter the expected percentages. Calling the calculation
module, we get

               NAME OF DATA SET:  GRADE DISTRIBUTION
               DATA POINTS: 5  DEGREES OF FREEDOM: 4

               CALCULATED CHI SQUARE: 7.07E+00
               NULL HYPOTHS: OBSERVED AND EXPECTED SAME

               ACCEPT NULL HYPOTHS AT 5.0% LEVEL OF SIG

So, we conclude that there is no significant difference
between the teacher's grade distribution and the traditional
distribution.  We can also examine the chi square values
point by point, in which case we would see

               CLASS NAME          CHI SQUARE
               A                   2.87E-02
               B                   7.35E-01
               C                   1.36E+00
               D                   1.59E-03
               F                   4.94E+00

We see here that the largest contribution to the total chi
square came from the low number of F's.  We can also use the Edit 
module to look at the data.  Since there is a complex relationship 
between the observed and expected values, the Edit module will only 
allow you to change the data set and class names.  If you want to edit
the values, you must re-enter them using the input module.

[Problem: An instructor believes that the further a student
sits from the lecture table, the more likely the student is
to be absent.  The instructor takes attendance for a week
and sums the number of absences for each row of seats.  The
following results are obtained:

Row        1   2   3   4   5   6   7   8   9  10  11  12
Absences   7   9   6  11  12   9  10   8  13  10  11  13

Is the instructor correct in this belief?

[Answer: To find out, we run a one way chi square.  The null
hypothesis is that the observed and expected number of
absences are the same.  The choice for the expected number
of absences is that they are all the same, regardless of
row.  Thus, if we reject the null hypothesis we have reason
to believe the instructor is correct.  However, the
calculation indicates that the null hypothesis is to be
accepted - that the observed and expected are the same.
Since the expected number was independent of row, there is
no evidence that the instructor was correct.

                          B. TWO WAY

     A two way chi square is when you have both rows and
columns of data (for example the number of students classified by
both age and sex). The data entry, calculation and
presentation of results are carried out in much the same way
as in the one way chi square, with one addition.  When you
enter the expected values, you can select the Independence
Values option.

     To illustrate what the independence values are, suppose
we had the following observed division of students by age
and sex:

                               Age
                 Less than 25      More then 25       Total
 S   Males           18                     12         30
 e
 x   Females         35                     37         72

       Total         53                     49   Grand
                                                 total 102

The independence values are calculated by assuming that only
the totals in the rows and columns effect the expected
values.  For example, there are 102 students in total, 53,
or 51.9 per cent, of whom are under 25. Since we have a total of 30
males, we would expect 51.9 per cent of 30 or 16 (rounded to whole
number) would be less than 25.  Also, the other expected
values are:

                               Age
                 Less than 25      More then 25   Total

   S   Males        16                 14            30
   e
   x   Females      37                 35            72

       Total        53                 49      Grand
                                               total 102

     As an example, let's assume that at a particular school, the 
distribution of male and female instructors by highest earned degree 
was:

               Bachelor        Master      Doctor

     Male        4              69           65

     Female     12              31            9


Is there a difference between the observed and the expected
independence values?  Running the two way chi square program
we get:

               NAME OF DATA SET:  EDUCATION OF FACULTY
               DATA POINTS : 6    DEGREES OF FREEDOM: 2
               CALCULATED CHI SQUARE: 3.04E+01

               NULL HYPOTHS: OBSERVED AND EXPECTED SAME

               REJECT NULL HYPOTHS AT 0.1% LEVEL OF SIG

The observed and expected values are very different.  This
means that either their is a significant difference between
the two or that number of faculty in a particular category
is a significant relationship of the category's education
and sex values.  We can also look at the chi square by
individual category and at the listing of the data to see
that the largest contributions to the total chi square come
from the high number of female bachelors (12, expect 4, chi
square 16.0) and the low number of female doctors (9, expect
20, chi square 6.05). Take this up with your bargaining
agent immediately!

[Problem: A student survey finds the following results for
the family income and the likelihood that a student has a
part-time job:

                   Family Income (in 1000's of $)

               <10     10-20    20-30    30-40    >40

   Has Job      16       59      53        17       3

   No Job       12       78      67        52      41

What is the chi square for these data and what can you
conclude?

[Answer: The chi square is 31.5 and we reject the null
hypothesis that the observed and expected are the same.
That is, whether the student has a job does depend on
the family income.  Looking at the data and the individual
chi square values, we see that the largest contributions to
the total chi square come from the low number of high income
students with jobs (3, expect 16, chi square 10.6) and the
high number of low income students with jobs (16, expect 10,
chi square 3.6).

                  OTHER STATISTICAL TESTS

     The remainder of the statistical tests on this disk are
reached by pressing S (See more programs) from the Program
menu.  It will be necessary to flip the disk in the drive
to access them.

                            SIGN TEST

     In most of the analyses we have examined so far we have
assumed that the data are normally distributed.  What if this
is not the case? Can we still analyze it? Yes, but we use
what are called nonparametric or distribution free tests.
The results of these tests are generally easier to
understand than the classical tests and are useful when the
data cannot be quantitatively expressed (for instance, when
it is ranked, as one would rank a group of wines). Interestingly, 
while the data may not be normally distributed, it is possible to 
calculate statistics from the data which are normally distributed.

     The Sign Test (more properly, the Two-sample Sign Test)
is one of the simplest of these tests. It is called the Sign
Test because it determines whether the difference between
paired observations is positive or negative and then uses
the number of plus and minus signs to calculate a statistic
for accepting or rejecting the null hypothesis.

     For example, suppose you wished to see if an exercise
program effected the blood pressures of the students in a PE
class. Here is the data you wish to analyze:

        Student Name   Pressure Before     Pressure After
                       Exercise            Exercise

        Tom            134                 139
        Dick           124                 124
        Harry          172                 175
        Bob            149                 140
        Ted            167                 155
        Alice          145                 152
        Carol          148                 140

Using the Sign Test program, you would enter something such
as "BLOOD PRESSURE" for the Name of Data Set, "STUDENT" for
the Name of Variable, "BEFORE EXERCISE" for the 1st Condition Name and 
"AFTER EXERCISE" for the 2nd Condition Name. Then you would enter the 
names [1] and before and after exercise blood pressures and type "END" 
for the STUDENT entry when this is done. Selecting Calculation from
the Main Menu, you get the following results:


      NAME OF DATA SET: BLOOD PRESSURE
      NAME OF VARIABLE: STUDENT
      #  CONDITION  EXAMINED          MEAN
      1  BEFORE EXERCISE             1.48E+02
      2  AFTER EXERCISE              1.46E+02
      NUMBER: 7          EFFECTIVE NO.: 6
      NULL HYPTHS: CONDITION HAS NO EFFECT
      PLUSES: 3     MINUSES: 3    TIES: 1
      CALCULATED PROBABILITY: .65625
       ACCEPT NULL HYPOTHS AT 5% LEVEL OF SIG

The result NUMBER (7) is the number of entries (students),
EFFECTIVE NO. (6) is the number whose condition changed
(pressure before exercise differed from the pressure after),
the NULL HYPTHS is the null hypothesis, namely that there
was no effect on blood pressure because of exercise, PLUSES (3),
MINUSES (3) and TIES (1) are the number of entries with
positive, negative and no changes in the blood pressures,
and CALCULATED PROBABILITY [2] (.65625) is probability that,
in this case, six independent samples, each of which could
either increase or decrease with equal probability, would
result in a distribution with 3 or more plus changes.

     The result you will most likely be most interested one
is the final line, which says "ACCEPT NULL HYPOTHS AT 5%
LEVEL OF SIG."  In this case, it means that we have, at a 5%
level of significance, no evidence that the exercise program
had any effect on blood pressure.

Problem: To study the effect of room temperature on student
performance, 19 students were given two quizzes, one when
the temperature was 65 degrees, another when it was 70
degrees.  Here are the results. At the 5 per cent level of
significance, does temperature have any effect on quiz
performance?

Student   Low Temp  High Temp  Student   Low Temp  High Temp
          Grade     Grade                    Grade     Grade
A         12         8             B         11        12
C          6         6             D          8         9
E         10        12             F          4         7
G         12        12             H          8         8
I          7         6             J          7         9
K          4         3             L          3         2
M         11        10             N          6         8
O          9        10             P          6         9
Q         12         9             R         10        12
S          5         6

[Answer: The null hypothesis is accepted at the 5 per cent level of
significance, that is temperature has no effect on the grade.]


NOTES ON THE SIGN TEST

[1] When entering the data under the variable column, a carriage 
return will number the entries in order.

[2] If the effective sample size is 30 or more, the calculated 
statistic is a z-score.

                   THE MANN-WHITNEY U TEST
                      (A RANK SUM TEST)

     The assumption made when the means of samples drawn
from two populations are compared using Student's t-test (or
for more than two populations using the analysis of variance
program) is that the populations are normally distributed.
Often this is not the case.  For instance, the ages of students at 
community colleges are not normally distributed - we have a group of 
people in their teens and twenties, many less in their thirties to 
fifties and then a smaller bunch in their sixties.  The distribution 
might look something like:


                       |
                       ||
                       ||
              #      | ||
                     | ||||           |
                     |||||||          | |
               _____|||||||||__|___|__|||__|_________
               0  10  20  30  40  50  60  70
                Age of Community College
                     Student

This is not a normally distributed population and to compare
it with a sample from another population requires something
other than the programs we have used so far.  The method we
will use for comparing samples drawn from non normally
distributed populations is a rank-sum test, specifically
the Mann-Whitney U test.

     Basically, the Mann-Whitney U test (or U-test) combines
the two samples into one, ranks the combined sample from low
to high and then determines the sum of the ranks for each of
the individual samples. For example, suppose sample A
consisted of the values 1.0 and 3.2 and sample B of the
values 2.6, 4.9 and 3.7.

     Sample A: 1.0, 3.2
     Sample B: 2.6, 4.9, 3.7

     Combined Sample: 1.0, 3.2, 2.6, 4.9, 3.7

          identity   --- >     A    B    A    B    B
     Ordered Combined Sample: 1.0, 2.6, 3.2, 3.7, 4.9

                             A  B  A  B  B  <-- identity
     Ranked Combined Sample: 1, 2, 3, 4, 5

     Rank sum of A : 1 + 3 = 4

     Rank sum of B : 2 + 4 + 5 = 11

To perform the U-test, we calculate a statistic called U based on the 
size of the samples and the rank sums and calculate the mean and 
standard deviation of the combined sample based on the null hypothesis 
that the two samples forming it were identically distributed. Using 
the values of U, the mean and the standard deviation, we can then 
calculate another statistic (in this case a z-score) to see if the 
null hypothesis is accepted or rejected at a given level of 
significance.

Example: The ages of 17 Canada students and 16 Skyline
students follow.

Canada: 22, 45, 37, 19, 63, 58, 18, 63, 46, 22, 37, 47, 29,
        38, 48, 35, 27

Skyline: 19, 21, 19, 26, 58, 43, 22, 19, 31, 18, 22, 18, 50,
         20, 20, 29

The question (null hypothesis) is: "Do the two colleges have
identical age distributions?" To find out, we run the U-test
program, calling the Data Set Name "AGES" and the Sample
Names "CANADA STUDENTS" and "SKYLINE STUDENTS", respectively. After 
entering the data (enter END to end data entry) and selecting the 
Calculation Mode, we get

     DATA SET NAME  : AGES

      #       SAMPLE NAME           NO.    RANK SUM
      1       CANADA STUDENTS       17     353.5
      2       SKYLINE STUDENTS      16     207.5

     MEAN 1: 38.4705883  MEAN 2: 27.1875

     MEAN OF U            : 136
     STD. DEV. OF U       : 27.7608838
     CALCULATED U         : 200.5
     CALCULATED Z         : 2.32341306

     NULL HYPOTHESIS: SAMPLE DISTRIBS. SAME

     REJECT NULL HYPOTHIS AT 5% LEVEL OF SIG

The rank sums, mean of U, standard deviation of U, calculated U and 
calculated Z are given for those interested.  Focusing on the 
conclusion, we see that the null hypothesis (that the two 
distributions are the same) is rejected at the 5 per cent level of 
significance [however, it would be accepted at the 2 per cent and 
lower levels]. We therefore conclude that the age distributions at the 
two campuses are different. Since the mean age of the Skyline students
[27.19] is less than that of the Canada students [38.47], we
conclude that Skyline students are, on the average, younger.

Problem: A class of 9 students is given extra assistance in
the Learning Center while a similar class of 10 students
receives no such assistance.  The final grades are given
below:

With assistance: 96, 67, 43, 83, 72, 68, 51, 87, 77

Without assistance: 38, 38, 81, 64, 29, 64, 54, 50, 56, 61

Use the U-test to determine if the distributions are the
same at the 5 per cent level of significance

[Answer: The null hypothesis is rejected at the 5 per cent level of
significance. Thus we conclude that the distributions are
different.  Since the mean grade of those who had assistance [71.56] 
is higher than that of those who did not [54.5], we can conclude that 
assistance from the Learning Center is beneficial.]

                      THE RUNS TEST

     Is the data you are analyzing truly random? That is,
are you sure that no bias, either intentional or accidental,
entered into the sampling procedure? You can determine the
randomness of the sample by means of the Runs Test, which is
based on the order in which the data was collected.

     A run is defined as a unbroken sub-sequence of identical
symbols. For instance, the answers on a true-false test might
be:

     T T F T F F T T F F T F

which, when broken down into runs, gives us:

     T T / F / T / F F / T T / F F / T / F

Thus we have 4 runs of T and 4 of F. If a sample has too few
runs (e. g. T T T T T T T T T T T T = 1 run) or too many
(for example T F T F T F T F T F T F = 12 runs), it would not be
random.  The Runs Test calculates the number of runs of each
type, the mean and standard deviation of the number of runs
for a random sample of the same size and then calculates a
statistic (a z score) upon which we can accept or reject the
null hypothesis that the sequence is random.

Example: The correct answers to a true-false portion of a
test are:

T F F T F F T T T F F T F F T F F T T F T T T F F T T F F F
T T T F T F T T F F T T F F T F T F T F T F T F F T F T T F
F T T F F T


Is the sample random or is there a pattern?

     Using the Runs Test, we enter an appropriate name for
the Data Set and indicate that the data are by class (see
below) and that the two classes will be TRUE and FALSE.
After entering the data as T and F (the program will
automatically assign different symbols for the two classes,
even if the classes are, for instance, TOUGH and TOUGHER) and
concluding with an END, the results are calculated and we
get the following:

         NAME OF DATA SET: TRUE FALSE TEST
         DATA IS BY CLASS
        #  CLASS EXAMINED                     NUMBER
        1  TRUE                               3.30E+01
        2  FALSE                              3.30E+01
          NULL HYPOTHS: SEQUENCE IS RANDOM

       NUMBER OF RUNS:  41              TOTAL NO.: 66
       MEAN OF NUMBER OF RUNS:              3.40E+01
       STANDARD DEVIATION OF RUNS:          4.03E+00
       CALCULATED Z: 1.74E+00

       ACCEPT NULL HYPTHS AT 5% LEVEL OF SIG

We see that, at the 5 per cent level of significance, the null 
hypothesis is accepted. The sequence is random.

The program will also accept numerical data for analysis. In this 
case, the median is calculated and the data are examined for runs of 
values above or below the median.

Problem: A chemistry instructor believes that he need not
grade his finals as, he believes, the first 10 per cent of the
students handing in the tests will be F's, the next 10 per cent will
be A's, the next 70 per cent will be B's and C's and the last 10 per cent will be F's again.  Nonetheless, he does grade the exams and, in order they were handed in, the grades were:

26, 13, 38, 76, 65, 84, 44, 91, 78, 31, 68, 86, 94, 41, 78,
84, 96, 12, 87, 43, 34, 78

Should the instructor's hypothesis be accepted or rejected
at the 5 per cent level?

[Answer: The hypothesis, as stated, is too complex to analyze with 
these programs.  However, when we do a Runs test on the data, we find 
that the sequence is randomly distributed about the median (at the 5 
per cent level of significance), so the instructor's hypothesis must 
be rejected.  However, he continues to believe it.]

                    RANK CORRELATION

     The least squares method allows you to determine if two normally 
distributed variables are correlated.  If they are not normally 
distributed, it still possible to determine if there is a correlation 
by the means of the Rank Correlation. For example, we might wish to 
determine, for example, if the opinions of two judges of seven horses 
are correlated. The opinions of a judge need not be normally 
distributed since it might well be possible, for example, that, in the
judge's opinion, four of the horses are excellent (although 
different), two are mediocre (but again different) and the
seventh is a real dog. While the horses can be ranked from 1
to 7, the distribution is not normal.

     The rank correlation coefficient, r, was introduced by Spearman 
in 1904 to test if there is a correlation between ranked variables. 
This coefficient is the statistic that is examined to test the null 
hypothesis.  The null hypothesis in this test, by the way, is that the 
two rankings are DIFFERENT.

     For instance, suppose a group of faculty and a group of
administrators independently rank twelve different applicants for the 
position of chief executive officer of a school.  Here are the ranks:

   Applicant      Rank by Faculty     Rank by Administrators
       A                   6                        1
       B                   3                        5
       C                  10                        3
       D                   7                       10
       E                   1                        9
       F                  11                        4
       G                   5                        2
       H                   9                        7
       I                   2                        8
       J                   8                        6
       K                   4                       11
       L                  12                       12

Are the rankings of the faculty and administrators they same
of different? Running the Rank Correlation program, entering
RANKING OF APPLICANTS for the Data Set Name, APPLICANT for
the Variable and RANK BY FACULTY for the 1st Condition and
RANK BY ADMINISTRATORS for the 2nd Condition, then entering
the ranks and finally doing the calculation, we get:

        NAME OF DATA SET: RANKING OF APPLICANTS
        NAME OF VARIABLE: APPLICANT
         #   CONDITION EXAMINED           MEAN
         1   RANK BY FACULTY              6.50E+00
         2   RANK BY ADMINISTRATORS       6.50E+00
        NUMBER: 12

        NULL HYPTHS: CONDITIONS ARE DIFFERENT

        RANK CORRELATION COEFFICIENT: -5.59E-02
        STD. DEV. OF RANK CORR COEF:   3.02E-01
        CALCULATED Z: -1.86E-01

         ACCEPT NULL HYPTHS AT 5% LEVEL OF SIG

The rank correlation coefficient [-.0559] is very close to zero, which 
generally means that the null hypothesis should be accepted.  Indeed, 
the calculated z score is small (the fact that it is negative is 
irrelevant) and the null hypothesis is accepted.  Thus we would 
conclude that the rankings by the faculty and the rankings by the
administrators are different.

Problem: The program will allow you to enter numerical data (which it 
will then rank).  For instance, suppose we wished to see if the IQ's 
of the members of married couples are correlated.  Here's the data:

        COUPLE     HUSBAND   WIFE
          A           96      110
          B          114      118
          C          119       99
          D          134      137
          E          142      124
          F          109      108
          G          127      101
          H           99       98
          I          147      153
          J          125      138
          K          127      104

[Answer: When the rank correlation coefficient is calculated, we see 
that the null hypothesis should be rejected at the 2 per cent level, 
that is, the IQ's of the husband and the wife are correlated.]

                 PROBABILITY

     The Probability program on the Program menu will allow you to 
calculate the probability of a value, or range of values, in the 
binomial, normal and Poisson distributions.

     For the binomial distribution, you enter the total number in the 
sample and the probability that a single event will occur.  You then 
ask for the probability that either a specific number will occur or a 
range of numbers will occur. For example, you could say you have 2 
objects in your sample and that the probability of a single event 
happening is 0.5. If you ask what is the probability that 2 events 
will happen, the program will tell you it is 0.25.  If the number
of objects in the sample is greater than 30, either the normal or the 
Poisson approximation to the binomial distribution is used.

     For the normal distribution, you must enter the mean
and standard deviation.  The other information is the same
as the binomial. The program will only calculate a probability for 
several events within plus or minus three standard deviations of the 
mean.

     For the Poisson distribution, you must enter the mean. The 
probability of any number of events occurring will be calculated but 
if the value is far from the mean, the calculation could take a long 
time as factorials are being calculated.

                          FILE SUFFIXES

     When you save a file to disk, the program will add a suffix to it 
as an identification tag. Here is a list of programs, suffixes and 
type of data stored:

Program                  Suffix              Type of data

MEAN                     .MEAN               Raw data
MEAN                     .MEST               Analyzed data
ONE WAY CHI SQUARE       .CS1D               Raw data
TWO WAY CHI SQUARE       .CS2D               Raw data
LEAST SQUARES            .LSF                Raw data
MULTIPLE REGRESSION      .MTRG               Raw data
SIGN                     .SIGNS              Raw data
MANN WHITNEY U           .MWUT               Raw data
RUNS                     .RUNS               Raw data
RANK CORRELATION         .RCD                Raw data
ONE WAY CHI SQUARE       .CS1D               Raw data
TWO WAY CHI SQUARE       .CS2D               Raw data
LEAST SQUARES            .LSF                Raw data
MULTIPLE REGRESSION      .MTRG               Raw data
SIGN                     .SIGNS              Raw data
MANN WHITNEY U           .MWUT               Raw data
RUNS                     .RUNS               Raw data
RANK CORRELATION         .RCD                Raw data

APPLE II FOREVER!