THE TEACHER'S STATISTICAL PACKAGE by George W. Goth These programs are shareware. You may use them as much as you like for as long as you like. You may also give as many copies as you like to any friends, post them on bulletin boards, use them as party favors and so forth. If you find them useful, would you eventually please send $10.00 to George Goth Skyline College 3300 College Drive San Bruno, CA 94066 Those contributing their share will become registered users and will receive updates when they will become available. - - - - - - - - - - - GETTING STARTED It is STRONGLY recommended that you make a back-up copy of the disk and put the original in a safe place. The disk is not copy protected. Be sure to copy BOTH sides of the disk. Once you have made the backup copy, boot it and in a few moments the monitor screen will display the following: THE TEACHER'S STATISTICAL PACKAGE WOULD YOU LIKE TO CALCULATE: M: MEANS AND STANDARD DEVIATIONS T: T-TEST COMPARISON OF MEANS A: ANALYSIS OF VARIANCE C: CHI-SQUARED ANALYSIS L: LEAST SQUARE FITS R: MULTIPLE REGRESSIONS S: SEE MENU OF MORE PROGRAMS Q: QUIT THE PROGRAMS PLEASE ENTER A LETTER < > This is called the PROGRAM MENU and you use it to select the statistical programs. At the bottom of the screen you see the phrase "PLEASE ENTER A LETTER < >" with a flashing box, called the cursor, inside the brackets. You select the program you wish to run by typing the letter corresponding to the program. For example, typing the letter M will run the program that calculates means and standard deviations and so forth. You need not type a carriage return after typing the letter. [A NOTE ON ENTERING INFORMATION INTO THE COMPUTER: To use these programs you have to enter two types of information, choices and data. CHOICES are just that; you are given a list of options and you choose one. Whenever you must make a choice the available choices are either indicated by a list (or menu) or will be a "Y" (for yes) or "N" (for no) to a question (such as "SEND RESULTS TO PRINTER? < >"). Choices are made with single keystrokes with no carriage return. DATA consists of the actual numerical results you want to analyze and the names you give the data. To enter the data, type in the correct number or name and then type a carriage return. If you have any qualms about whether to type a carriage return, enter the information and wait a second - if the computer does not immediately do something, it means it is waiting for a carriage return]. MEAN AND STANDARD DEVIATION Type a capital letter M to select the Mean and Standard Deviation Program and we will begin our tour of statistics. If you acquire data, such as by making measurements of something or grading a group of tests, you will almost never find that all the measurements (or tests) have the same value. Rather the values range from a minimum to a maximum, with perhaps a clustering toward the middle. What we would want to know about this data are: a) what is the best representation of its average value; and b) what is the best representation of its variability (i. e. are the values tightly clustered together or spread out). There are several ways of expressing the average, the three best known being: a) the mean (also called the arithmetic mean), which is the sum of all the values divided by the number of values. Thus the mean of 1, 2, 3 and 3 is 1 + 2 + 3 + 3 _ mean = ------------- = 2.25 = x 4 The mean is the most useful of the averages. If the data collected was from the entire population, the mean is called the population mean; if the data represented only a portion of the population, the mean is the sample mean. We will generally be working with sample means, often designated by an x with a line over it (called x bar). b) the median, which is middle value, that is half the other values are smaller than this and half are larger. For our data from a) the median is 2.5. c) the mode (from French for 'fashionable'), which is the value that occurs most often. For our data, the mode is 3, as there are two 3's. If more than two numbers occur with the same maximum (as in the case 1,2,3,3,4,5,5), the distribution is bimodal. There are also several measures of the variability (or dispersion) of the data. The three best known are: a) the range, the difference between the maximum and minimum values. For our example from above, the range would be 3-1=2. The range is not a particularly useful measure of dispersion. For instance, you could have a thousand 3's and a single value of 53, and the range would be 50. The range is also 50 for a single 3 and a single 53. b) the mean deviation. If you were to subtract the mean value from each of the data points (to calculate the deviation of each point) and sum the result, you would get zero. 1 - 2.25 = -1.25 2 - 2.25 = - .25 3 - 2.25 = .75 3 - 2.25 = .75 sum 0 The mean deviation is the sum of the absolute values of the deviations, divided by the number of values. In this case it is (1.25 + .25 +.75 + .75)/4 = .75. The mean deviation tends to underestimate the variability in many cases. c) the variance is the sum of the squares of the individual deviations divided by the number of observations less 1. The positive square root of the variance is called the standard deviation, s. For our sample the standard deviation is _______________________________________ / 2 2 2 2 s = _ / (-1.25) + (-.25) + (.75) + (.75) =.96 \ / ------------------------------------ \ / 4 - 1 The standard deviation is the most useful of the measures of dispersion, as we will see later. With those definitions in mind, let's use the program to calculate some results. When you pressed the letter M, after a few moments you saw the following on the monitor screen: MEAN AND STANDARD DEVIATION (C) 1986 GEORGE GOTH V 1.00 I: INPUT MODULE E: EDIT MODULE C: CALCULATE MODULE D: DISK MODULE Q: QUIT PLEASE ENTER A CHOICE < > All the programs are constructed around modules which allow you to enter data (INPUT), change data (EDIT), calculate results (CALCULATE) or store it on disk (DISK). Since no data has been entered yet, let's do that by typing I (no carriage return) to get to the INPUT MODULE. Once in the Input Module you are asked for a NAME OF DATA SET. This is a tag or ID of the data. If you don't have a name, press return. However, let's assume that you want to know the averages of the ages of a group of mathematics students. So name the Data set MATH STUDENT AGES. Now you see a line saying "Point #1" with a flashing cursor after it. You are being asked to enter the age of the first student. Suppose the ages were 18, 17, 19, 19, 22, 23, 23, 18, 20, 22 Enter each of these numbers (followed by a carriage return) and write the word END to end the data entry and return to the Main Menu. Suppose you think you made an error in the data entry. Now choose E from the Main Menu to go to the EDIT MODULE. If you do this you will see a new screen and across the bottom are the words: View Data Order Data Change Data Main Menu Delete Data The View Data Option (V) allows you to look at the data to see if you made a mistake. You will be asked if you want to send the data to the printer for a hard-copy (Y for yes, N for no). If there are more than 10 data points, the program will pause at every 10th value. The points are numbered sequentially (the first age of 18 is given the number 1, etc). The Change Data Option (C) allows you to change either the Data Set Name or a Value of a point by selecting either D or V. If you wish to change a value, enter the number of the point you wish to change, the value will be displayed and either type in a new value (to correct it) or press return (to keep it). The Delete Data Option (D) allows you to remove a value from the data. If you wish to do this, type the number of the point and, after its value is displayed, type either a Y for yes or a N for no to delete or keep the point. If a point at the beginning or in the middle of the data is deleted, all the others are renumbered. Thus, if you deleted the first 18, the value of point 1 would become 17. The Order Data Option (O) allows you to arrange the value from lowest to greatest value. This is useful if you want to draw a graph of the data later. The Main Menu Option (M) returns you to the Main Menu when you have completed your editing. After editing the data, return to the Main Menu and Choose C for the Calculation Module. Before the calculation is done, you will be asked if you want to send the results to the printer (answer Y or N) and if you want to calculate the median and mode [1]. Let's calculate the median and mode in this case. In a few moments, the following appears NAME OF DATA SET: MATH STUDENT AGES NUMBER OF DATA POINTS: 10 MEDIAN: 19.5 MODE: 23 MEAN: 20.1 STANDARD DEVIATION: 2.23358208 ------------------------------------ AVE MEAN AND STD. DEV. SCORES AIN MENU PLEASE ENTER A CHOICE < > The Save Mean and Standard Deviation Option (S) allows you to save the mean and standard deviation [2] in memory for use later. When you press S, the computer will beep, indicating the data has been saved. The Z Scores Option (Z) will display a set of statistics called, most appropriately, the z scores. A z score for a point is: value of point - mean z = --------------------- standard deviation Thus the z score for age of 18 in this collection of ages would be: 18 - 20.1 z = ------------- = -.9401 2.23358208 The closer the size of the z score is to zero, the closer the point is to the mean value. Z scores are discussed more fully below. If you have not already done so, save these results using the S option and return to the Main Menu. Choose the Input option. Notice that you are warned that data are already in memory. Since you are about to calculate another mean, clear it. Now do the following problem. Problem: A group of 12 students in an English class have the following ages: 22, 23, 21, 19, 18, 22, 20, 18, 23, 24, 18, 23. Calculate the mean and standard deviation of the ages of this group. Save the results when you are done and return to the Main Menu. [Answer: mean is 20.9166667, standard deviation is 2.23437335.] You are now ready to explore the wonders of the Disk Option (D). When you press D, you see the following: S: Save to Disk F: Fetch from Disk C: Catalog Disk L: Lock a File U: Unlock a File R: Rename a File D: Delete a File M: Main Menu The Save to Disk Option (S) allows you to save data on the floppy disk. The disk must have sufficient room on it for the data to be stored, so you should have a Prodos formatted disk named STAT.PRODOS available. You will be asked whether you wish to save Raw Data (the data values only) or a Set of Means (the means and standard deviations of the data sets you have stored in memory, along with their data set names). Choose S to save the set of means, for we will be using them later, and give the set an appropriate name [3], such as STUDENT AGES. The Fetch from Disk Option (F) allows you to recover raw, unanalyzed data from the disk. The Catalog Option (C) allows you to examine the contents of the disk to see what programs and files are stored on it. The Lock a File Option (L) allows you to "lock" a file, that is prevent another file with an identical name from replacing it. The Unlock a File Option (U) unlocks the file, so that another file with the same name can replace it or so that it can be deleted. The Rename a File Option (R) allows you to give the file a new name. The Delete a File Option (D) permanently removes the file from the disk. When done, return to the Main Menu, choose Q (for Quit), which will get you back to the program menu. You can either take a break now or go onto the next section. FOOTNOTES ON MEAN [1] If there are several modes, the program will find only the one with the largest value. Median and mode calculations involve ordering and ranking the data, and can take a considerable time if there is a large number of data points. [2] The program will also store the sum of the squares of the data. This information is used in the ANOVA program. [3] See list of file suffixes at the end of this manual. WHAT ARE WE GOING TO DO WITH THE MEAN? Now that we know how to calculate the mean and standard deviation, what are we going to do with it? In the next several programs we will compare one mean with another (or others) to see if they could be representatives of the same population. Generally the two (or more) samples of different populations will not have exactly the same value for their means, but because the data are dispersed (as measured by the standard deviation), it could well be possible that the populations do have the same mean. So, we wish to develop a series of tests, using the samples drawn from the populations, for determining if the two (or more) populations have identical means. The method of doing this is called hypothesis testing. A hypothesis is a statement or claim about the nature of the populations. We will focus on what is called the null hypothesis (designated H ) and develop tests for 0 accepting or rejecting this null hypothesis as true or false. As an example, suppose we wish to see if a coin is 'fair', that is if we tossed it a very large number of times it would come up heads half the time and tails the other half. But we don't have the time to flip it a very large number of times, we only can flip it, say, 50 times. We do so and it comes up heads 26 times and tails 24 times. Is it truly a fair coin? Or, to put it into statistical jargon is H : p(heads) = 0.5 0 to be accepted or rejected? (The symbols mean "the null hypothesis is that the coin will come up heads 50 per cent of the time.) If we can reject the null hypothesis, then we accept what is called the alternative hypothesis, which in our case would be that the coin does not come up heads 50 per cent of the time. It is obvious from our results (26 heads, 24 tails) that it did not come up heads exactly 50 per cent of the time, but it would not be unreasonable to say that even if we did have a strictly fair coin and flipped it 50 times over and over, some of those times we would get 26 heads and 24 tails. If this result occurred fairly often (say 5 times out of 100), then we could say that we were certain that our coin was fair to the 5 per cent level of significance. If a coin came up with 26 heads and 24 tails only very occasionally (say 1 time out of 1,000,000) we would be very doubtful that our coin was truly a fair one (there would only be a one in a million chance that it was), in other words we could reject it at the 5 per cent level of significance. We do not have the space here to discuss, much less to develop, the theories behind the procedures that are used to test hypotheses. As the programs were designed to carry out the calculations and state the results, we will present merely a brief discussion of conditions under which the tests are used. However, we must begin with a short discussion of the term "normal distribution." For many phenomena, the data are arranged in what is popularly called a 'bell shaped curve.' The curve is more properly called a normal curve, or a Gaussian curve, and it is determined by the value of its mean and its standard deviation; that is, if you know these two values, you can calculate the value of the normal distribution at any point. The mean determines where the central maximum lies and the standard deviation determines the 'flatness' of the curve; that is, if the standard deviation is small, the normal distribution becomes spike-like, if the standard deviation is large, the normal distribution becomes flattened. If a set of data are normally distributed, then the probability that a value x in the data set will be found is directly related to the value of the normal curve for x. For instance, if the mean of a set of normally distributed data is 10 and the standard deviation is 2, then the probability of finding a data point of value 14 is 0.0228 (2.28 per cent). For the same data, the probability of finding a data point of value 16 is 0.0013 (0.13 per cent). We will be assuming for the first several programs that the data are normally distributed. As for the tests we use to test the hypothesis, they are a) z scores. As mentioned above, a z score is given by value of point - mean z = --------------------- standard deviation Therefore a point far from the mean (assuming the standard deviation is small), will have a large z score. Since large z scores imply small probabilities, the likelihood of finding such a point will be small. For instance, the probability of finding a point with a z of 3.09 is only 0.1 per cent. For instance, it can be shown that if you flip a fair coin 500 times, if will come up heads with a mean of 250 times and a standard deviation of 11. If we flip a coin 500 times and it comes up heads 280 times, is it a fair coin? First we calculate the z score 280 - 250 z = --------- = 2.73 11 If we look up in the appropriate table (found in any elementary statistics book), we will find that this value of z corresponds to a probability of 0.0032 (0.32 per cent). That is, if we were to flip a truly fair coin for 10,000 cases of 500 times each (a total of 5 million flips!), only in about 3 cases would we have come up with 280 (or more) heads. Thus it would be reasonable to reject the null hypothesis that our coin (the one with a z of 2.73) was a fair one. The z scores are used to test the null hypothesis in the MANN-WHITNEY U, SIGN, RUNS and RANK CORRELATION programs. b) Student's t. When we have two normally distributed sets of data, with known means and standard deviations, we can test to see if the means are the same (i.e the null hypothesis is that the means are the same) at a given level of significance by computing a particular statistic called t and checking whether the value of t occurs with the level of significance (or better) we desire. This method of comparing means was developed by William Gosset, published his results under the name Student. The particular value of t will also depend on what is called the 'degrees of freedom' (df), which is related to the sample size.[1] The t test is used in the programs T-TEST COMPARISON OF MEANS and ANOVA (what is termed a 'multiplet t test' is used in the second). c) F test (named after Sir Ronald Fisher) is used when three or more sets of data are to be compared to see if all the means are identical. It is used in the program ANOVA (which stands for analysis of variance). c) chi squared test. The chi squared test is used in several statistical applications. In these programs the chi squared test is used to test hypotheses for data which may not be normally distributed and, in fact, may not be ordinal (that is, have true numerical values). For instance, we assign in our classes the grades A, B, C and so forth. These letters are not ordinal (they are what is termed nominal, that is used in naming classes and could be replaced by 1, 2, 3 etc) much less normally distributed and yet it is still possible to do statistics on such data sets. With all that in mind, let's check some hypotheses. FOOTNOTE [1] If the degree of freedom is 30 or more, the value of t is approximately equal to the value of z. T-TEST COMPARISON OF MEANS The first statistical test we will carry out is Student's t test for comparing means. Boot the disk (if it is not on already) and from the program menu choose T to select T-TEST COMPARISON OF MEANS. Shortly you will see a menu very much like that used in the MEANS programs (all but one of these programs have this as their main menu). Since we have already calculated some means and stored the result as the file AGES, let's analyze that. Type D to get to the disk menu and then F to fetch a file. Type the word AGES to load that file and then M to return to the main menu. Then type C for calculation and in a few moments you will see the following result: NAME OF DATA SET: AGES NUMBER OF VALUES: 2 # NAME NO MEAN STD. DEV. 1 ENGLISH 12 2.09E+01 2.23E+00 2 MATH 10 2.01E+01 2.23E+00 STD. ERROR OF DIFFERENCE: 1.00E+00 T-TEST: 8.14E-01 DEG. OF FREE.: 20 NULL HYPOTHESIS: MEANS ARE THE SAME ACCEPT NULL HYPOTHS AT 5.0% LEVEL OF SIG Notice that strange looking numbers (like 2.09E+01) occur. This is because some of the numbers are expressed in exponential (or scientific) notation. This allows the program to fit numbers of various lengths (example 2.09, .0000289) in the slots of the same size on the screen and makes for a pleasant looking format. In case you are not familiar with this notation, the 'E' means 'raised to the power of 10', so: 1 2.09E+01 is the same as 2.09 x 10 = 20.9 0 2.23E+00 is the same as 2.23 x 10 = 2.23 -1 8.14E-01 is the same as 8.14 x 10 = 0.814 The program prints out the set names, means, standard deviations for each of the data sets compared, along with the standard error of the difference (a statistic used in the t test), the value of t and the degrees of freedom. Of most interest to us are the last two lines, the statement of the null hypothesis (the means are the same) and the conclusion, which in this case is that we should accept the null hypothesis at the 5.0 per cent level of significance. Note that if the null hypothesis is accepted at the 5.0 per cent level of significance, it would also be accepted at the 10 per cent, 20 per cent, 50 per cent etc levels of significance - for any level more than 5.0 per cent. To find out if it were accepted at, say, 2 per cent, you would have to look up the value of t at 2 per cent for 20 degrees of freedom and see if it were greater than our calculated t (0.814), in which case our hypothesis would be accepted at 10 per cent, or less than 0.814, in which case it would be rejected. The program will only accept or reject at levels of 5.0 per cent, 1.0 per cent and 0.1 per cent, which are the most commonly used levels in hypothesis testing with the t test. Let's try to add some more data. Return to the main menu by pressing M, then get to the Input Module with an I, indicate you do not wish to clear the current data with a N and then let's add the following data for a sample of ages of telecommunications students : Name = TELE COMM, number = 17, mean = 29.6, standard deviation = 3.8. Then type END to end the input module and then C to get back to the calculation. Now the program will halt and ask you which of the three means you wish to compare.[If you can't remember which number corresponds to which mean, return to the main menu and use the Edit module to view the data.] Let's compare 1 and 3 (that is the ages of English students with those in telecommunications). So type 1 (return) and 3 (return) and you will see NAME OF DATA SET: AGES NUMBER OF VALUES: 3 # NAME NO MEAN STD. DEV. 1 ENGLISH 12 2.09E+01 2.23E+00 3 TELE COMM 17 2.96E+01 3.80E+00 STD. ERROR OF DIFFERENCE: 1.27E+00 T-TEST: 6.58E+00 DEG. OF FREE.: 27 NULL HYPOTHESIS: MEANS ARE THE SAME REJECT NULL HYPOTHS AT 0.1% LEVEL OF SIG This time we see that we are to reject the null hypothesis and conclude that there is a difference in the ages of the two types of students at the 0.1 per cent level of significance. There is less than a 1 in 1000 chance that the means of the two ages are the same. ANALYSIS OF VARIANCE Suppose you would want to compare more than two means, how would you do it? You could, perhaps, to a series of t-tests between the individual means but there are both practical and theoretical reasons for not doing this. Practically, even only a few means to compare would result in many t-tests (9 different means would result in a total of 36 different tests, which, even with a computer, would take a long time). Also, when doing it by pairs other complications arise, particularly the possibility of what is called a Type I error is increased [1]. The preferred method of comparing more than two means is called an Analysis of Variance (ANOVA), which, not surprisingly, analyses variance between several means. Without going into the details, the ANOVA computes two statistics, the Within Groups Sums of Squares (SSE), a measure of the variability within the samples because of their standard deviations, and the Between Groups Sums of Squares (SSB), a measure of the variability between the samples due to the differences in their means. Appropriate numbers of degrees of freedom and a statistic F are then computed. The null hypothesis that all the means are the same is then rejected or accepted at a level of significance by comparing the calculated F with appropriate values from a table. The levels of significance tested by this program are 5 per cent and 1 per cent, the most commonly used ones. For instance, suppose we wished to see if school attendance varied with the day of the week. On 18 randomly selected days we take attendance and get the following results: Mondays: 143, 128, 110 Tuesdays: 162, 136, 144, 158 Wednesdays: 160, 132, 180, 160, 138 Thursdays: 138, 168, 120 Fridays: 110, 130, 135 We would first need to calculate the means, standard deviationsand an additional term called the sum of the squares, which is done for us by the MEAN program. After we calculate these values and store them on disk, we can use the ANOVA program to analyze the results. We get to the ANOVA program by typing A from the Program Menu. After we are in the ANOVA program, we fetch the data on the means we have stored by using the Disk module and then go to the Calculate module. We are asked how many of the means we wish to compare, and since we want to analyze all of them, we type 5. Shortly afterwards, the program displays: NAME OF DATA SET: ATTENDANCE NUMBER OF DATA VALUES: 5 # NAME # NAME 1 MONDAY 2 TUESDAY 3 WEDNESDAY 4 THURSDAY 5 FRIDAY SOURCE OF VAR DF SS MS BETWEEN GRPS 4 2.52E+03 6.29E+02 AMONG GRPS 13 4.00E+03 3.08E+02 TOTAL 17 6.52E+03 F = 2.05E+00 NULL HYPOTHSIS: ALL MEANS ARE SAME ACCEPT NULL HYPOTHSIS AT 5% LEVEL OF SIG The result, surprisingly, is that the null hypothesis is accepted at the 5 per cent level of significance. There is no evidence that the attendance varies with the day of the week. [Problem: For a biology class project a student grows bean plants utilizing four different fertilizers. She then collects and weighs the beans produced by each plant. From the data given below, determine if the fertilizers have any effect on the yield of the beans. Also determine which, if any, of the fertilizers, produce significantly different yields.] Fertilizer Yield (in kilograms) Macro-gro 3.7, 4.4, 3.8, 4.2 Multi-yield 3.6, 3.5, 4.1, 3.9, 3.8 Poly-bean 6.3, 5.9, 4.8, 6.0, 5.8, 4.9 Legume-orama 2.9, 3.3, 2.8 [Answer: Using the MEAN program to calculate the means and other needed statistics and then the ANOVA program to do the analysis, we conclude that, at the 1 per cent level of significance, the null hypothesis that the mean yields are the same should be rejected. But how do we answer the second part, that is, how do we determine which, if any, of the fertilizers differ from each other. To do this, we use the T-test Option (T) at the bottom of the Calculation Module screen. This allows us to compare any two means using a multiplet t-test. If we compare Macro-gro (1) with Multi-yield (2), we see that at the 5 per cent level of significance, the means are the same. Similarly, the yields of Macro-gro and Poly-bean (3) are different at the 0.1 per cent level of significance and so on.] FOOTNOTES ON ANOVA [1] A Type I error is when the null hypothesis is true but is incorrectly rejected. A Type II error is when the null hypothesis is false but is incorrectly accepted. Most null hypothesis testing is based on type I errors. THE METHOD OF LEAST SQUARES In the section on ANOVA we learned how a group of data sets could be compared to learn if their means were the same within a certain error. Thus, for instance, if a student placed varying amounts of fertilizer on bean plants and measured the yields, she could determine if all the means were the same, or if not, which were different. However, we can extract more information from this type of experiment. We can learn if the amount of fertilizer and the yield are 'linearly correlated'. The term linearly correlated means that if the amount of fertilizer is increased, the yield increases (or perhaps decreases) by a fixed amount; that is the two variables are related by a straight line. A straight line is fully characterized by its slope and intercept. The slope is the amount the dependent variable (in this case the yield) changes when the independent variable (in this case the amount of fertilizer) and is generally given the letter 'm'. The intercept is the value of the dependent variable when the independent has a value of 0 and is most often given the symbol 'b'. The independent variable is generally given the letter 'x' and the dependent the letter 'y'. The equation for the straight line is then: y = m * x + b (the * symbol means multiply). We could attempt to find the relationship by plotting the dependent variable (yield) on the vertical axis (the ordinate) and the independent variable on the horizontal axis (the abscissa) of a graph, drawing a straight line through the data points and then extracting the slope and intercept by hand. However, the question would then be, "since an infinite number of straight lines can be drawn on the graph, how do we know if we have drawn the line that be have drawn the one that best represents the correlation?" In other words, how do we know we have drawn the 'line of best fit'? The line of best fit tells us if the dependent variable (the yield in this case) is directly effected by the independent variable (the amount of fertilizer). The line of best fit is calculated by a method called 'least squares' and the least squares program on this disk does this and reports the slope and intercept of the best fit line. A statistic, called Pearson's r, is also calculated. Pearson's r is a measure of the 'goodness' of the fit, and it has the following properties: a) if r = 1 the data are totally correlated; b) if r = 0 then either there is no relationship between x and y, or the relationship is not linear; c) the closer the value of r is to 1, the more closely the calculated values lie to the measured ones. An r of .9 indicates a good fit. [2] Choose L from the program menu we can use it to analyze the following data. A student puts varying amounts (in grams) of fertilizer on a series of bean plants. At a fixed time, she harvests the beans and measures the yield (in grams). She obtains the following data: fertilizer (grams) .50 1.0 2.0 3.6 5.0 7.5 yield (grams) 11.8 15.0 14.5 22.0 26.5 30.5 What is the equation for the best line relating the yield and fertilizer? What is the value of r? We choose the Input module, give the data set the name BEAN PROJECT, the Independent variable the name FERTILIZER (GRAMS), the Dependent YIELD (GRAMS) and enter the numerical data and end with an END. Then from the main menu, we choose G for graph and then L for a linear fit [3], the program calculates the results and.... Lo and behold! A graph! Yes, the program draws a scatter diagram of the data and the best straight line. Plus, it gives us at the bottom of the screen, the values for the slope, intercept and fit! We can get a hardcopy of these results (the values, not, alas, the graph) by pressing any key and then typing an R (for send Results to printer). This gives us: INDEPENDENT VARIABLE (X) : FERTILIZER (GRAMS) DEPENDENT VARIABLE (Y) : YIELD (GRAMS) SLOPE : 2.74981414 ERROR ON SLOPE : .291349548 [4] PERCENT ERROR ON SLOPE : 10.5% INTERCEPT : 11.0443587 ERROR ON INTERCEPT : 1.18742069 PERCENT ERROR ON INTERCEPT : 10.7% PEARSON'S R ('FIT') : .978277057 TYPE OF FIT : LINEAR Thus, in this case, the equation for the straight line relating the yield and fertilizer would be (with rounding) yield = (2.75) * (fertilizer) + 11.04 Since Pearson's r is close to 1, the fit is a good one. It is also possible to interpolate a value, that is determine what the calculated value for y would be for a given value of x. For example, if from the Graph menu we choose I for interpolate, and enter 10 for the abscissa (the x value), we get a value of the ordinate (y) of 39.0 grams. [Problem: A group of 19 students in an English class are given a vocabulary test and a spelling test. The number of errors for each are given below. Are the data correlated? What is the equation for the line between them? If a student got 13 errors on the vocabulary test, what would be the likely value for the number of errors on the spelling test? Errors on ---------- Vocabulary 14 19 13 15 14 14 17 10 15 12 9 16 11 15 16 8 Spelling 6 11 4 8 6 3 11 3 6 5 4 9 3 10 10 4 Vocabulary 15 17 16 Spelling 9 11 9 [Answer: The data are somewhat correlated. The points are scattered around the line and the fit is only 0.843. The equation would be: spelling errors = (.873) * (vocabulary errors) - 5.27 For a student with 13 vocabulary errors, the most likely number of spelling errors would be 6.] FOOTNOTES TO THE LEAST SQUARE PROGRAM [1] The line of best fit has the following characteristic: if y' is the value of the dependent variable as calculated from the equation for the line for each value of the independent variable, x, and y is the actual measured value for the independent variable for each value of x, then the best fit line has the smallest possible value for the term: sum over 2 all the x's (y' - y) Since, to find the best line, we want to minimize a term that is a square, the procedure is called THE METHOD OF LEAST SQUARES. [2] Pearson's r has a range of from +1 to -1. A value of +1 means the data are totally correlated and that as x increases, y increases. A value of -1 means that the data are totally correlated and that as x increases y decreases. A value of 0 for Pearson's r means that the data are totally uncorrelated (that is, the value of x has no effect on the value of y). This program reports only the absolute value of Pearson's r. You can tell if it is negative or positive by looking at the graph. [3] This program also allows you to transform the data to see if the line is linear in some other way. The other fits that we can do are: a * x Exponential: log (y) = a * x + b ; or y = b * e ((y-b)/a) Logarithmic: y = a * log (x) + b ; or e = x a Power: log (y) = a * log (x) + b ; or y = b * x These are useful mainly in scientific applications. [4] The error on the slope is essentially the standard deviation of the slope parameter. The error on the intercept is, essentially, the standard deviation of the intercept. MULTIPLE REGRESSION What if a variable is a function of two other variables, can we still determine a relationship between the three of them? For instance, could we find the relationship between the yield of the bean plants, the amount of fertilizer and the amount of water provided. We can, using the method of multiple regression. Multiple regression can be expanded to find the relationship between any number of independent variables and a dependent one, but the program on this disk limits you to two independent variables. We get into the multiple regression program by typing the letter R from the program menu. As an example, suppose that a mathematics teacher believes that the students' grades on his first calculus test will be a linear function of the grades on algebra and geometry placement exams given at the beginning of the course. Here is the data: Algebra placement score 14 14 9 10 9 15 19 12 18 Geometry placement score 7 6 4 4 7 8 9 8 7 Calculus exam grade 76 64 42 57 69 82 91 76 84 When we enter the data and run the calculation we get following: DATA SET NAME: PREDICTION OF CALCULUS NO. OF PNTS : 9 SYM VARIABLE Y DEPENDENT CALCULUS GRADE X1 1ST INDEP ALGEBRA SCORE X2 2ND INDEP GEOMETRY SCORE Y = 5.84E+00 * X1 + 1.54E+00 * X2 + 1.17E+01 CORR. COEF. = 9.23E-01 This means that the relationship is Calc. grade = 5.84 * (alg scr) + 1.54 * (geo scr) + 11.7 The 'corr. coef.' is the coefficient of correlation, a statistic similar to Pearson's r, which is a measure of the fit. A value of .923 indicates that the fit is fairly good. [Problem: A physics instructor believes that the location of a falling object is a function of the time since its release and the square of the time. He takes the following measurements: Time (seconds) 1 2 3 4 Time squared (seconds squared) 1 4 9 16 Location (meters) 12.0 14.2 6.6 -10.8 Are the variables correlated? What is the relationship between them? What is the location of the body at 5 seconds? [Answer: They are perfectly correlated (this is, in fact, the data for an object in free fall in a vacuum). The equation is y = 16.9 * x1 - 4.9 * x2 + 0 At 5 seconds (and 25 seconds squared) the object would be at -38 meters. THE CHI SQUARE TEST A. ONE WAY So far, we have been looking at situations where we have made some observations (called the observed, O) and compared it with some theoretical or expected values (called the expected, E) to see if the difference between the two is a significant deviation (such as a bias) or only to do a sampling error (one that results because our sample is of finite size). In all the cases so far, the data has been assumed to be normally distributed. What if it is not normally distributed or, horrors, it is by class (A, B, C, etc) instead of numerical. Can we still determine if there is a significant difference between the observed and the expected? Indeed there is. We use the chi square test. The formula for the calculation of chi square is: 2 2 (O - E) X = sum of -------- E The X is the Greek letter chi, which looks very much like an X. The chi square test can be applied to any case where we know the observed and expected values, even one where the data are normally distributed. We reject or accept our null hypothesis (which is that there is no difference between the observed and expected) by checking our value of chi square in a table of the chi square distribution (for a certain number of degrees of freedom). If our value exceeds the value in the table, for a specified level of significance, we reject the null hypothesis; if the two are equal or ours is lower, we accept the null hypothesis. For instance, suppose that in a certain course, the grade distribution had traditionally been: Grade A B C D F Percent receiving 5 20 50 20 5 But when a particular instructor teaches the course, his grade distribution is 26 A's, 92 B's, 270 C's, 101 D's and 14 F's. Is there a significant difference between this instructor's grade distribution and the traditional one? We type C from the program menu to get to the chi square programs and the O to select the one-way program (one way is when we have only one column of data, two way is when we have rows and columns). Giving the data set the name GRADE DISTRIBUTION and entering the observed data, we are next asked if the expected values are to be entered as numbers (that is values for each point), as percents, or to be assumed equal for all classes. We will choose percents and then enter the expected percentages. Calling the calculation module, we get NAME OF DATA SET: GRADE DISTRIBUTION DATA POINTS: 5 DEGREES OF FREEDOM: 4 CALCULATED CHI SQUARE: 7.07E+00 NULL HYPOTHS: OBSERVED AND EXPECTED SAME ACCEPT NULL HYPOTHS AT 5.0% LEVEL OF SIG So, we conclude that there is no significant difference between the teacher's grade distribution and the traditional distribution. We can also examine the chi square values point by point, in which case we would see CLASS NAME CHI SQUARE A 2.87E-02 B 7.35E-01 C 1.36E+00 D 1.59E-03 F 4.94E+00 We see here that the largest contribution to the total chi square came from the low number of F's. We can also use the Edit module to look at the data. Since there is a complex relationship between the observed and expected values, the Edit module will only allow you to change the data set and class names. If you want to edit the values, you must re-enter them using the input module. [Problem: An instructor believes that the further a student sits from the lecture table, the more likely the student is to be absent. The instructor takes attendance for a week and sums the number of absences for each row of seats. The following results are obtained: Row 1 2 3 4 5 6 7 8 9 10 11 12 Absences 7 9 6 11 12 9 10 8 13 10 11 13 Is the instructor correct in this belief? [Answer: To find out, we run a one way chi square. The null hypothesis is that the observed and expected number of absences are the same. The choice for the expected number of absences is that they are all the same, regardless of row. Thus, if we reject the null hypothesis we have reason to believe the instructor is correct. However, the calculation indicates that the null hypothesis is to be accepted - that the observed and expected are the same. Since the expected number was independent of row, there is no evidence that the instructor was correct. B. TWO WAY A two way chi square is when you have both rows and columns of data (for example the number of students classified by both age and sex). The data entry, calculation and presentation of results are carried out in much the same way as in the one way chi square, with one addition. When you enter the expected values, you can select the Independence Values option. To illustrate what the independence values are, suppose we had the following observed division of students by age and sex: Age Less than 25 More then 25 Total S Males 18 12 30 e x Females 35 37 72 Total 53 49 Grand total 102 The independence values are calculated by assuming that only the totals in the rows and columns effect the expected values. For example, there are 102 students in total, 53, or 51.9 per cent, of whom are under 25. Since we have a total of 30 males, we would expect 51.9 per cent of 30 or 16 (rounded to whole number) would be less than 25. Also, the other expected values are: Age Less than 25 More then 25 Total S Males 16 14 30 e x Females 37 35 72 Total 53 49 Grand total 102 As an example, let's assume that at a particular school, the distribution of male and female instructors by highest earned degree was: Bachelor Master Doctor Male 4 69 65 Female 12 31 9 Is there a difference between the observed and the expected independence values? Running the two way chi square program we get: NAME OF DATA SET: EDUCATION OF FACULTY DATA POINTS : 6 DEGREES OF FREEDOM: 2 CALCULATED CHI SQUARE: 3.04E+01 NULL HYPOTHS: OBSERVED AND EXPECTED SAME REJECT NULL HYPOTHS AT 0.1% LEVEL OF SIG The observed and expected values are very different. This means that either their is a significant difference between the two or that number of faculty in a particular category is a significant relationship of the category's education and sex values. We can also look at the chi square by individual category and at the listing of the data to see that the largest contributions to the total chi square come from the high number of female bachelors (12, expect 4, chi square 16.0) and the low number of female doctors (9, expect 20, chi square 6.05). Take this up with your bargaining agent immediately! [Problem: A student survey finds the following results for the family income and the likelihood that a student has a part-time job: Family Income (in 1000's of $) <10 10-20 20-30 30-40 >40 Has Job 16 59 53 17 3 No Job 12 78 67 52 41 What is the chi square for these data and what can you conclude? [Answer: The chi square is 31.5 and we reject the null hypothesis that the observed and expected are the same. That is, whether the student has a job does depend on the family income. Looking at the data and the individual chi square values, we see that the largest contributions to the total chi square come from the low number of high income students with jobs (3, expect 16, chi square 10.6) and the high number of low income students with jobs (16, expect 10, chi square 3.6). OTHER STATISTICAL TESTS The remainder of the statistical tests on this disk are reached by pressing S (See more programs) from the Program menu. It will be necessary to flip the disk in the drive to access them. SIGN TEST In most of the analyses we have examined so far we have assumed that the data are normally distributed. What if this is not the case? Can we still analyze it? Yes, but we use what are called nonparametric or distribution free tests. The results of these tests are generally easier to understand than the classical tests and are useful when the data cannot be quantitatively expressed (for instance, when it is ranked, as one would rank a group of wines). Interestingly, while the data may not be normally distributed, it is possible to calculate statistics from the data which are normally distributed. The Sign Test (more properly, the Two-sample Sign Test) is one of the simplest of these tests. It is called the Sign Test because it determines whether the difference between paired observations is positive or negative and then uses the number of plus and minus signs to calculate a statistic for accepting or rejecting the null hypothesis. For example, suppose you wished to see if an exercise program effected the blood pressures of the students in a PE class. Here is the data you wish to analyze: Student Name Pressure Before Pressure After Exercise Exercise Tom 134 139 Dick 124 124 Harry 172 175 Bob 149 140 Ted 167 155 Alice 145 152 Carol 148 140 Using the Sign Test program, you would enter something such as "BLOOD PRESSURE" for the Name of Data Set, "STUDENT" for the Name of Variable, "BEFORE EXERCISE" for the 1st Condition Name and "AFTER EXERCISE" for the 2nd Condition Name. Then you would enter the names [1] and before and after exercise blood pressures and type "END" for the STUDENT entry when this is done. Selecting Calculation from the Main Menu, you get the following results: NAME OF DATA SET: BLOOD PRESSURE NAME OF VARIABLE: STUDENT # CONDITION EXAMINED MEAN 1 BEFORE EXERCISE 1.48E+02 2 AFTER EXERCISE 1.46E+02 NUMBER: 7 EFFECTIVE NO.: 6 NULL HYPTHS: CONDITION HAS NO EFFECT PLUSES: 3 MINUSES: 3 TIES: 1 CALCULATED PROBABILITY: .65625 ACCEPT NULL HYPOTHS AT 5% LEVEL OF SIG The result NUMBER (7) is the number of entries (students), EFFECTIVE NO. (6) is the number whose condition changed (pressure before exercise differed from the pressure after), the NULL HYPTHS is the null hypothesis, namely that there was no effect on blood pressure because of exercise, PLUSES (3), MINUSES (3) and TIES (1) are the number of entries with positive, negative and no changes in the blood pressures, and CALCULATED PROBABILITY [2] (.65625) is probability that, in this case, six independent samples, each of which could either increase or decrease with equal probability, would result in a distribution with 3 or more plus changes. The result you will most likely be most interested one is the final line, which says "ACCEPT NULL HYPOTHS AT 5% LEVEL OF SIG." In this case, it means that we have, at a 5% level of significance, no evidence that the exercise program had any effect on blood pressure. Problem: To study the effect of room temperature on student performance, 19 students were given two quizzes, one when the temperature was 65 degrees, another when it was 70 degrees. Here are the results. At the 5 per cent level of significance, does temperature have any effect on quiz performance? Student Low Temp High Temp Student Low Temp High Temp Grade Grade Grade Grade A 12 8 B 11 12 C 6 6 D 8 9 E 10 12 F 4 7 G 12 12 H 8 8 I 7 6 J 7 9 K 4 3 L 3 2 M 11 10 N 6 8 O 9 10 P 6 9 Q 12 9 R 10 12 S 5 6 [Answer: The null hypothesis is accepted at the 5 per cent level of significance, that is temperature has no effect on the grade.] NOTES ON THE SIGN TEST [1] When entering the data under the variable column, a carriage return will number the entries in order. [2] If the effective sample size is 30 or more, the calculated statistic is a z-score. THE MANN-WHITNEY U TEST (A RANK SUM TEST) The assumption made when the means of samples drawn from two populations are compared using Student's t-test (or for more than two populations using the analysis of variance program) is that the populations are normally distributed. Often this is not the case. For instance, the ages of students at community colleges are not normally distributed - we have a group of people in their teens and twenties, many less in their thirties to fifties and then a smaller bunch in their sixties. The distribution might look something like: | || || # | || | |||| | ||||||| | | _____|||||||||__|___|__|||__|_________ 0 10 20 30 40 50 60 70 Age of Community College Student This is not a normally distributed population and to compare it with a sample from another population requires something other than the programs we have used so far. The method we will use for comparing samples drawn from non normally distributed populations is a rank-sum test, specifically the Mann-Whitney U test. Basically, the Mann-Whitney U test (or U-test) combines the two samples into one, ranks the combined sample from low to high and then determines the sum of the ranks for each of the individual samples. For example, suppose sample A consisted of the values 1.0 and 3.2 and sample B of the values 2.6, 4.9 and 3.7. Sample A: 1.0, 3.2 Sample B: 2.6, 4.9, 3.7 Combined Sample: 1.0, 3.2, 2.6, 4.9, 3.7 identity --- > A B A B B Ordered Combined Sample: 1.0, 2.6, 3.2, 3.7, 4.9 A B A B B <-- identity Ranked Combined Sample: 1, 2, 3, 4, 5 Rank sum of A : 1 + 3 = 4 Rank sum of B : 2 + 4 + 5 = 11 To perform the U-test, we calculate a statistic called U based on the size of the samples and the rank sums and calculate the mean and standard deviation of the combined sample based on the null hypothesis that the two samples forming it were identically distributed. Using the values of U, the mean and the standard deviation, we can then calculate another statistic (in this case a z-score) to see if the null hypothesis is accepted or rejected at a given level of significance. Example: The ages of 17 Canada students and 16 Skyline students follow. Canada: 22, 45, 37, 19, 63, 58, 18, 63, 46, 22, 37, 47, 29, 38, 48, 35, 27 Skyline: 19, 21, 19, 26, 58, 43, 22, 19, 31, 18, 22, 18, 50, 20, 20, 29 The question (null hypothesis) is: "Do the two colleges have identical age distributions?" To find out, we run the U-test program, calling the Data Set Name "AGES" and the Sample Names "CANADA STUDENTS" and "SKYLINE STUDENTS", respectively. After entering the data (enter END to end data entry) and selecting the Calculation Mode, we get DATA SET NAME : AGES # SAMPLE NAME NO. RANK SUM 1 CANADA STUDENTS 17 353.5 2 SKYLINE STUDENTS 16 207.5 MEAN 1: 38.4705883 MEAN 2: 27.1875 MEAN OF U : 136 STD. DEV. OF U : 27.7608838 CALCULATED U : 200.5 CALCULATED Z : 2.32341306 NULL HYPOTHESIS: SAMPLE DISTRIBS. SAME REJECT NULL HYPOTHIS AT 5% LEVEL OF SIG The rank sums, mean of U, standard deviation of U, calculated U and calculated Z are given for those interested. Focusing on the conclusion, we see that the null hypothesis (that the two distributions are the same) is rejected at the 5 per cent level of significance [however, it would be accepted at the 2 per cent and lower levels]. We therefore conclude that the age distributions at the two campuses are different. Since the mean age of the Skyline students [27.19] is less than that of the Canada students [38.47], we conclude that Skyline students are, on the average, younger. Problem: A class of 9 students is given extra assistance in the Learning Center while a similar class of 10 students receives no such assistance. The final grades are given below: With assistance: 96, 67, 43, 83, 72, 68, 51, 87, 77 Without assistance: 38, 38, 81, 64, 29, 64, 54, 50, 56, 61 Use the U-test to determine if the distributions are the same at the 5 per cent level of significance [Answer: The null hypothesis is rejected at the 5 per cent level of significance. Thus we conclude that the distributions are different. Since the mean grade of those who had assistance [71.56] is higher than that of those who did not [54.5], we can conclude that assistance from the Learning Center is beneficial.] THE RUNS TEST Is the data you are analyzing truly random? That is, are you sure that no bias, either intentional or accidental, entered into the sampling procedure? You can determine the randomness of the sample by means of the Runs Test, which is based on the order in which the data was collected. A run is defined as a unbroken sub-sequence of identical symbols. For instance, the answers on a true-false test might be: T T F T F F T T F F T F which, when broken down into runs, gives us: T T / F / T / F F / T T / F F / T / F Thus we have 4 runs of T and 4 of F. If a sample has too few runs (e. g. T T T T T T T T T T T T = 1 run) or too many (for example T F T F T F T F T F T F = 12 runs), it would not be random. The Runs Test calculates the number of runs of each type, the mean and standard deviation of the number of runs for a random sample of the same size and then calculates a statistic (a z score) upon which we can accept or reject the null hypothesis that the sequence is random. Example: The correct answers to a true-false portion of a test are: T F F T F F T T T F F T F F T F F T T F T T T F F T T F F F T T T F T F T T F F T T F F T F T F T F T F T F F T F T T F F T T F F T Is the sample random or is there a pattern? Using the Runs Test, we enter an appropriate name for the Data Set and indicate that the data are by class (see below) and that the two classes will be TRUE and FALSE. After entering the data as T and F (the program will automatically assign different symbols for the two classes, even if the classes are, for instance, TOUGH and TOUGHER) and concluding with an END, the results are calculated and we get the following: NAME OF DATA SET: TRUE FALSE TEST DATA IS BY CLASS # CLASS EXAMINED NUMBER 1 TRUE 3.30E+01 2 FALSE 3.30E+01 NULL HYPOTHS: SEQUENCE IS RANDOM NUMBER OF RUNS: 41 TOTAL NO.: 66 MEAN OF NUMBER OF RUNS: 3.40E+01 STANDARD DEVIATION OF RUNS: 4.03E+00 CALCULATED Z: 1.74E+00 ACCEPT NULL HYPTHS AT 5% LEVEL OF SIG We see that, at the 5 per cent level of significance, the null hypothesis is accepted. The sequence is random. The program will also accept numerical data for analysis. In this case, the median is calculated and the data are examined for runs of values above or below the median. Problem: A chemistry instructor believes that he need not grade his finals as, he believes, the first 10 per cent of the students handing in the tests will be F's, the next 10 per cent will be A's, the next 70 per cent will be B's and C's and the last 10 per cent will be F's again. Nonetheless, he does grade the exams and, in order they were handed in, the grades were: 26, 13, 38, 76, 65, 84, 44, 91, 78, 31, 68, 86, 94, 41, 78, 84, 96, 12, 87, 43, 34, 78 Should the instructor's hypothesis be accepted or rejected at the 5 per cent level? [Answer: The hypothesis, as stated, is too complex to analyze with these programs. However, when we do a Runs test on the data, we find that the sequence is randomly distributed about the median (at the 5 per cent level of significance), so the instructor's hypothesis must be rejected. However, he continues to believe it.] RANK CORRELATION The least squares method allows you to determine if two normally distributed variables are correlated. If they are not normally distributed, it still possible to determine if there is a correlation by the means of the Rank Correlation. For example, we might wish to determine, for example, if the opinions of two judges of seven horses are correlated. The opinions of a judge need not be normally distributed since it might well be possible, for example, that, in the judge's opinion, four of the horses are excellent (although different), two are mediocre (but again different) and the seventh is a real dog. While the horses can be ranked from 1 to 7, the distribution is not normal. The rank correlation coefficient, r, was introduced by Spearman in 1904 to test if there is a correlation between ranked variables. This coefficient is the statistic that is examined to test the null hypothesis. The null hypothesis in this test, by the way, is that the two rankings are DIFFERENT. For instance, suppose a group of faculty and a group of administrators independently rank twelve different applicants for the position of chief executive officer of a school. Here are the ranks: Applicant Rank by Faculty Rank by Administrators A 6 1 B 3 5 C 10 3 D 7 10 E 1 9 F 11 4 G 5 2 H 9 7 I 2 8 J 8 6 K 4 11 L 12 12 Are the rankings of the faculty and administrators they same of different? Running the Rank Correlation program, entering RANKING OF APPLICANTS for the Data Set Name, APPLICANT for the Variable and RANK BY FACULTY for the 1st Condition and RANK BY ADMINISTRATORS for the 2nd Condition, then entering the ranks and finally doing the calculation, we get: NAME OF DATA SET: RANKING OF APPLICANTS NAME OF VARIABLE: APPLICANT # CONDITION EXAMINED MEAN 1 RANK BY FACULTY 6.50E+00 2 RANK BY ADMINISTRATORS 6.50E+00 NUMBER: 12 NULL HYPTHS: CONDITIONS ARE DIFFERENT RANK CORRELATION COEFFICIENT: -5.59E-02 STD. DEV. OF RANK CORR COEF: 3.02E-01 CALCULATED Z: -1.86E-01 ACCEPT NULL HYPTHS AT 5% LEVEL OF SIG The rank correlation coefficient [-.0559] is very close to zero, which generally means that the null hypothesis should be accepted. Indeed, the calculated z score is small (the fact that it is negative is irrelevant) and the null hypothesis is accepted. Thus we would conclude that the rankings by the faculty and the rankings by the administrators are different. Problem: The program will allow you to enter numerical data (which it will then rank). For instance, suppose we wished to see if the IQ's of the members of married couples are correlated. Here's the data: COUPLE HUSBAND WIFE A 96 110 B 114 118 C 119 99 D 134 137 E 142 124 F 109 108 G 127 101 H 99 98 I 147 153 J 125 138 K 127 104 [Answer: When the rank correlation coefficient is calculated, we see that the null hypothesis should be rejected at the 2 per cent level, that is, the IQ's of the husband and the wife are correlated.] PROBABILITY The Probability program on the Program menu will allow you to calculate the probability of a value, or range of values, in the binomial, normal and Poisson distributions. For the binomial distribution, you enter the total number in the sample and the probability that a single event will occur. You then ask for the probability that either a specific number will occur or a range of numbers will occur. For example, you could say you have 2 objects in your sample and that the probability of a single event happening is 0.5. If you ask what is the probability that 2 events will happen, the program will tell you it is 0.25. If the number of objects in the sample is greater than 30, either the normal or the Poisson approximation to the binomial distribution is used. For the normal distribution, you must enter the mean and standard deviation. The other information is the same as the binomial. The program will only calculate a probability for several events within plus or minus three standard deviations of the mean. For the Poisson distribution, you must enter the mean. The probability of any number of events occurring will be calculated but if the value is far from the mean, the calculation could take a long time as factorials are being calculated. FILE SUFFIXES When you save a file to disk, the program will add a suffix to it as an identification tag. Here is a list of programs, suffixes and type of data stored: Program Suffix Type of data MEAN .MEAN Raw data MEAN .MEST Analyzed data ONE WAY CHI SQUARE .CS1D Raw data TWO WAY CHI SQUARE .CS2D Raw data LEAST SQUARES .LSF Raw data MULTIPLE REGRESSION .MTRG Raw data SIGN .SIGNS Raw data MANN WHITNEY U .MWUT Raw data RUNS .RUNS Raw data RANK CORRELATION .RCD Raw data ONE WAY CHI SQUARE .CS1D Raw data TWO WAY CHI SQUARE .CS2D Raw data LEAST SQUARES .LSF Raw data MULTIPLE REGRESSION .MTRG Raw data SIGN .SIGNS Raw data MANN WHITNEY U .MWUT Raw data RUNS .RUNS Raw data RANK CORRELATION .RCD Raw data APPLE II FOREVER!