| TablesTurn the numbers from the computer printout into tables  Except for the correlation matrix that is part of the reliability 
        analysis, you will be required to make new tables, rather than simply 
        xeroxing the output you get from the computer. This is to indicate 
        to me that you KNOW what numbers you are looking at to answer the various 
        questions in the report. It would be useful to bring "blank" 
        tables for your variables set up like those in the example to class on 
        the date indicated, so that you can copy your data into them. Tables should 
        be numbered sequentially so you can refer to them in your report, should 
        have titles, and should include labels for the variables so that a person 
        can understand what the numbers in the table are without having to read 
        the rest of your paper. BE SURE TO DOUBLE- AND TRIPLE-CHECK YOUR COPYING 
        OF NUMBERS FROM THE PRINTOUT INTO THE TABLES after the class.  Inspect Your Data
         Frequency distributions. There are several useful things to inspect 
          the frequencies for. 
           Problems of low variability. If 80% or more of the respondents 
            chose one answer for an item, the variability of that variable is 
            so low that statistical results with it are extremely problematic 
            given small samples sizes. Results from such variables cannot be trusted, 
            and they are often simply discarded. Lesser problems of low variability, where 60% or more of the cases 
            choose one response, or where 80% or more of the cases are in two 
            adjacent responses (of a variable with more than three responses). 
            These may just indicate a skew in the population, but might also indicate 
            a biased sample or a biased question.  Comparisons across the different questions measuring the dependent 
            variable, to see which items elicit the most favorable responses, 
            and which the least favorable.  Information about the distribution of attitudes in the population. 
            Because your samples are non-random, you have to interpret these results 
            very cautiously, but it is still interesting to find out what people 
            said.  Reliability analysis and correlations among the questions measuring 
          the dependent variable. Ideally, the reliability analysis should show 
          all of the corrected item-total correlations (the correlation of each 
          item with the sum of all the other items) to be moderately high and 
          positive (better than .4 or .5). and the coefficient alphas to be over 
          .7. Also, for each item, the output should state that removing 
          the item would lead to a lower alpha. If all of these things 
          are true, your items are all good.If you don't have the ideal, there are several possible patterns to 
          check for in the correlation matrix itself:
           One or two questions show low negative correlations and small positive 
            ones with the other items; these questions have a low item-total correlation 
            and an indication that alpha will go UP if they are removed. This 
            probably means those questions are "bad" and the others 
            are OK. You discard the bad questions and get a revised index using 
            only the good ones.  There are subsets of questions that have moderate or strong positive 
            correlations with each other but negative or weak correlations with 
            the others. This means the questions seem to be tapping more than 
            one conceptual variable. You try to see which set most captures what 
            you had in mind and use those for your index. Occasionally, you decide 
            that both sets are interesting, and create two indices, one for each. There are a lot of strong negative correlations for one or two 
            variables, and their item-total correlations are strongly negative. 
            This usually means that you have forgotten to "reverse score" 
            a question, or that subjects read the opposite meaning into a question 
            than you intended. SEE ME.  Most correlations in the table are close to zero, and negative 
            ones are scattered across different questions, alphas and item-total 
            correlations are low. This means that none of the items are properly 
            related to any of the others, that there really is no single concept 
            that your questions measure. This is the worst situation to be in, 
            but it is rare. Most often, this turns out to be due to mistakes in 
            coding the data, usually when partners code their data separately 
            and turn out to be doing it differently. If you have data like this 
            and you have checked for and ruled out coding errors, you definitely 
            need to see me.   Look to see if the means of the INDEX get larger in the expected 
          direction across categories of the OPEN-ENDED question, and that the 
          analysis of variance test is significant at p < .01 or smaller. This 
          should occur if the INDEX and the OPEN-ENDED question are both "valid" 
          measures of your concept. If this does not happen, you try to figure 
          out which measure is "bad." Keep in mind that the open-ended 
          question might have a problem, or that the open-ended categories may 
          be out of order. Tests of the "obvious hypothesis using the index will employ 
          either correlation analysis (if your independent variable is 
          continuous) or analyses of variance (if I.V. is categorical). Tests 
          of the hypothesis using the open-ended question will use chi-squares. 
          Look to see if the hypothesis is confirmed or not. You may get 
           significant findings using both measures, nothing using either measure, or significant findings with one measure and not with the other.  Again, there are implications for the validity of each of these measures 
        from these results. We will discuss how you interpret these in class. The following are a set of tables, based on the student questionnaire 
        presented a few pages back. In class you will be given the actual printouts 
        from these data, just like the printouts you will later receive for your 
        own data. After the tables, there is a guide to the output, titled "Numbers 
        are our friends." EXAMPLE TABLES for Univariate statistics. The computer printout 
        provides a lot of statistics. The first few pages go into Tables 1 and 
        2.  Table 1. Frequency Distributions for Independent Variables (I show only four items for brevity of presentation)
 
        
          | 4sex31 | freq | party4 | freq | relig5 | freq | libcon9 | freq |  
          | male | 30 | Repub | 22 | Catholic | 18 | Ext. Liberal | 1 |  
          | female | 29 | Dem | 16 | Jewish | 1 | 2 | 10 |  
          |  |  | Prog | 3 | Protestant | 14 | 3 | 10 |  
          |  |  | Ind | 9 | Buddhist | 1 | 4 | 1 |  
          |  |  | Green | 1 | Hindu | 1 | 5 | 6 |  
          |  |  | Libertar | 1 | Deist | 2 | 6 | 9 |  
          |  |  | Other | 7 | None | 19 | 7 | 5 |  
          |  |  |  |  | Other | 3 | 8 | 5 |  
          |  |  |  |  |  |  | 9 | 7 |  
          |  |  |  |  |  |  | Ext. Conserv. | 1 |  Note: The computer prints only values of the variable that have 
        been chosen by respondents in the frequency tables. You should put in 
        all values because it helps you to compare the questions in terms of where 
        people feel most favorable or unfavorable. Note that the tables do NOT 
        report a mean for the open-ended question, or the independent variables, 
        even though the computer printed them, because in most cases it would 
        be meaningless. Table 2. Frequency Distributions And Means for Dependent Variables.(I present only the first four variables and the OPEN code for brevity)
 
         
          | deter11 | f | state12 | f | safe13 | f | cand14 | f | OPEN1 | f |   
          | 1 | 11 | 1 | 2 | 1 | 26 | 1 | 5 | strongly opopose | 22 |   
          | 2 | 19 | 2 | 8 | 2 | 6 | 2 | 11 | mod. oppose | 5 |   
          | 3 | 5 | 3 | 11 | 3 | 6 | 3 | 9 | ambig. | 8 |   
          | 4 | 8 | 4 | 7 | 4 | 5 | 4 | 6 | mod. favor | 10 |  
          | 5 | 12 | 5 | 19 | 5 | 9 | 5 | 9 | strongly favor | 14 |   
          | 6 | 4 | 6 | 12 | 6 | 7 | 6 | 19 |  |  |   
          |  |  |  |   
          | mean | 3.05 |  | 4.17 |  | 2.76 |  | 4.02 |  | (inapp.) |  Reliability analysis; correlations among the closed-ended questions. 
        The computer next printed out a triangular correlation matrix, which 
        you copy into Table 3. (You may paste the matrix, as I have here, but 
      you must provide a title.)
 Table 3: Correlations among closed-ended questions and with the total 
        scale       
         
          |  | deter11 | safe13 | cand14 | retrib15 | wrong17 | cheap18 |   
          | deter11 | 1.0000 |  |  |  |  |  |   
          | safe13 | .613 | 1.0000 |  |  |  |  |   
          | cand14 | .505 | .648 | 1.0000 |  |  |  |   
          | retrib15 | .533 | .522 | .609 | 1.0000 |  |  |   
          | wrong17 | .470 | .627 | .560 | .559 | 1.0000 |  |   
          | cheap18 | .307 | .516 | .550 | .561 | .477 | 1.0000 |   
          | fed19 | .085 | .249 | .213 | -.151 | -.083 | -.098 |   
          | moral20 | .669 | .743 | .714 | .737 | .692 | .676 |  
          | final21 | .660 | .623 | .604 | .710 | .638 | .523 |  
          | apply22 | .548 | .511 | .476 | .380 | .543 | .262 |    
         
          |  | fed19 | moral20 | final21 | apply22 |   
          | fed19 | 1.0000 |  |  |  |   
          | moral20 | .054 | 1.0000 |  |  |   
          | final21 | -.023 | .846 | 1.0000 |  |   
          | apply22 | .158 | .494 | .546 | 1.0000 |  N of Cases= 59.0 Item-total Statistics 
        
          |  | Scale Mean If Item Deleted | Scale Variance if Item Deleted | Corrected Item-Total Correlation | Squared Multiple Correlation | Alpha if Item Deleted |  
          | deter11 | 29.1844 | 135.506 | .680 | .583 | .892 |  
          | safe13 | 29.4725 | 126.763 | .784 | .669 | .884 |  
          | cand14 | 28.2183 | 130.151 | .758 | .610 | .886 |  
          | retrib15 | 28.7683 | 128.557 | .693 | .632 | .891 |  
          | wrong17 | 28.6420 | 135.123 | .698 | .601 | .891 |  
          | cheap18 | 29.7268 | 138.933 | .582 | .547 | .898 |  
          | fed19 | 29.2692 | 159.576 | .057 | .332 | .924 |  
          | moral20 | 29.3878 | 124.042 | .895 | .865 | .877 |  
          | final21 | 29.0827 | 127.008 | .808 | .767 | .883 |  
          | apply22 | 29.3653 | 137.301 | .592 | .467 | .897 |    Reliability Analysis - Scale (Alpha) Reliability Coefficients 10 Items Cronback's Alpha=.903, Standardized Item Alpha=.898 We will discuss in class how you would interpret these numbers and what 
        are their implications for validity.
 Bivariate association between open-ended question and index. The 
        computer gives you the mean for the index separately for each category 
        of the open-ended question.  Table 4. Mean of the Index for Each Category of the Open-ended Question. 
         
          | OPEN1 | N | Mean  | StandardDeviation
 | f | p-value |   
          | Strongly Oppose | 22 | 21.65 | 5.547 |  |  |   
          | Mod. Oppose | 5 | 25.00 | 5.788 | 33.45 | .001 |  
          | Amabiguous | 8 | 30.94 | 13.157 |  |  |  
          | Mod. Favor | 10 | 37.10 | 6.280 |  |  |   
          | Strongly Favor | 14 | 48.71 | 5.470 |  |  |  Tests of "obvious" hypothesis and other hypotheses.
 Table 5 a. Mean of Index Separately by Political Party Preference (Obvious hypothesis). 
         
          | partycat | Mean Index | N | Standard Dev. | F | p-value |   
          | Repub. or libertarian | 41.55 | 23 | 10.93 | 8.872 | .001 |  
          | Democrat | 27.00 | 16 | 12.23 |  |  |  
          | Prog. or green | 18.58 | 4 | 3.47 |  |  |  
          | Independ | 24.11 | 9 | 6.57 |  |  |   
          | Other | 31.86 | 7 | 9.05 |  |  |    Table 5 b. Mean of Index Separately by Religion 
        
          | religcat | Mean Index | N | Standard Dev. | F | p-value |  
          | Catholic | 35.22 | 18 | 10.47 | 4.937 | .004 |  
          | Protestant | 40.00 | 14 | 13.83 |  |  |  
          | None | 25.47 | 19 | 10.93 |  |  |  
          | Deist, Jewish, Hindu, Buddhist, Other | 28 | 8 | 11.90 |  |  |  Table 6 a. Test of obvious hypothesis using open-ended question. Percentages of subjects in each category of the open-ended 
      coding categories (collapsed) in relationship to political party (further collapsed).
 
         
          |  | OPEN (grouped) |   
          | Party (trichotomy) | Codes 1-2 (Oppose) | Code 3 (Ambiguous) | Code 4-5 (Favor) |   
          | Repub. + Libertar (23) | 13% (3) | 43.5% (10) | 43.5% (10) |   
          | Democrat(16) | 56.3% (9) | 25% (4) | 45% (9) |   
          | other (20) | 50% (10) | 45% (9) | 5% (1) |   
          | Chi-square; p-value | 14.047; p< .007 |  
 Table 6 b. Test of second hypothesis, using open-ended answers. Percentage of respondents in each category of the open-ended question (collapsed) by categories of religion. 
        
          |  | OPEN (grouped) |  
          | Religion (four categories) | Codes 1-2 (Oppose) | Code 3 (Ambiguous) | Codes 4-5 (Support) |  
          | Catholic | 27.8% (5) | 50% (9) | 22.2% (4) |  
          | Protestant | 28.6% (4) | 14.3% (2) | 57.1% (8) |  
          | None | 42.1% (8) | 52.6% (10) | 5.3% (1) |  
          | Deist, Jewish, Hindu, Buddhist, other | 62.5% (5) | 25% (2) | 12.5% (1) |  
          | Chi-square, p-value | 16.024; p <.014 |    *** END OF EXAMPLE*** NUMBERS ARE OUR FRIENDS: A BRIEF STATISTICS GUIDE
 Some statistical concepts:
         Pearson product-moment correlations, analysis of variance, and chi-square 
          are three of many ways to calculate the likelihood that patterns of 
          relationships between variables are unlikely to be due to chance. Correlations can run from +1.00 through 0 to -1.00. Correlations 
          indicate the extent to which one variable is associated with a second. 
          Alternatively, one can see it as the extent to which you can predict 
          one variable from the other. The larger the number, the better the prediction. 
          The significance of a correlation -- the likelihood that it represents 
          a "real" relationship between the two variables -- can be 
          calculated. It is based in part on the size of the correlation and in 
          part upon the sample size. Analysis of variance (and t-test, a special case involving only two 
          sample groups) tests whether it is reasonable to assume that the sample 
          means on a continuous variable for several different groups come from 
          different populations. That is, can we conclude with some confidence 
          that there is something about the groups that is "really" 
          different, as opposed to having to accept that the apparent mean differences 
          are due to chance. Chi-square tests whether the distribution of cases in a contingency 
          table (a cross-tabulations of two nominal or ordinal variables) deviates 
          from what we would expect the distribution of cases in the table would 
          be based on chance alone. These tests of "significance" are based on ratios that 
          are calculated and then looked up in tables. (Nowadays, the computer 
          does all of this for you.) You need not understand the calculations 
          to understand the concept. What the level of significance tells you 
          is what proportion of times you would expect to find as strong a relationship 
          as is indicated by the data simply by the operation of chance factors. 
          The larger the obtained ratio, and the smaller the p-value, the less 
          likely it is that only chance is operating, and, therefore, the more 
          likely it is that the relationship represents some systematic association 
          between the variables being tested.  We do not say that we have "proved" that a relationship 
          exists, because we can only make probabilistic statements. The size 
          of the p-value gives us only more or less confidence that the 
          relationships we observe represent something meaningful. 
       Next Section Top Questions? Comments? Please contact jpiliavi@ssc.wisc.edu | 
 Home Vita Sociology 236 Sociology 357 Sociology 647 Sociology 965 Sociology Homepage     |