Jane Allyn Piliavin- Sociology at UW Madison, bascom graphic

Tables

Turn the numbers from the computer printout into tables

Except for the correlation matrix that is part of the reliability analysis, you will be required to make new tables, rather than simply xeroxing the output you get from the computer. This is to indicate to me that you KNOW what numbers you are looking at to answer the various questions in the report. It would be useful to bring "blank" tables for your variables set up like those in the example to class on the date indicated, so that you can copy your data into them. Tables should be numbered sequentially so you can refer to them in your report, should have titles, and should include labels for the variables so that a person can understand what the numbers in the table are without having to read the rest of your paper. BE SURE TO DOUBLE- AND TRIPLE-CHECK YOUR COPYING OF NUMBERS FROM THE PRINTOUT INTO THE TABLES after the class.

Inspect Your Data

  1. Frequency distributions. There are several useful things to inspect the frequencies for.
    1. Problems of low variability. If 80% or more of the respondents chose one answer for an item, the variability of that variable is so low that statistical results with it are extremely problematic given small samples sizes. Results from such variables cannot be trusted, and they are often simply discarded.
    2. Lesser problems of low variability, where 60% or more of the cases choose one response, or where 80% or more of the cases are in two adjacent responses (of a variable with more than three responses). These may just indicate a skew in the population, but might also indicate a biased sample or a biased question.
    3. Comparisons across the different questions measuring the dependent variable, to see which items elicit the most favorable responses, and which the least favorable.
    4. Information about the distribution of attitudes in the population. Because your samples are non-random, you have to interpret these results very cautiously, but it is still interesting to find out what people said.
  2. Reliability analysis and correlations among the questions measuring the dependent variable. Ideally, the reliability analysis should show all of the corrected item-total correlations (the correlation of each item with the sum of all the other items) to be moderately high and positive (better than .4 or .5). and the coefficient alphas to be over .7. Also, for each item, the output should state that removing the item would lead to a lower alpha. If all of these things are true, your items are all good.
    If you don't have the ideal, there are several possible patterns to check for in the correlation matrix itself:
    1. One or two questions show low negative correlations and small positive ones with the other items; these questions have a low item-total correlation and an indication that alpha will go UP if they are removed. This probably means those questions are "bad" and the others are OK. You discard the bad questions and get a revised index using only the good ones.
    2. There are subsets of questions that have moderate or strong positive correlations with each other but negative or weak correlations with the others. This means the questions seem to be tapping more than one conceptual variable. You try to see which set most captures what you had in mind and use those for your index. Occasionally, you decide that both sets are interesting, and create two indices, one for each.
    3. There are a lot of strong negative correlations for one or two variables, and their item-total correlations are strongly negative. This usually means that you have forgotten to "reverse score" a question, or that subjects read the opposite meaning into a question than you intended. SEE ME.
    4. Most correlations in the table are close to zero, and negative ones are scattered across different questions, alphas and item-total correlations are low. This means that none of the items are properly related to any of the others, that there really is no single concept that your questions measure. This is the worst situation to be in, but it is rare. Most often, this turns out to be due to mistakes in coding the data, usually when partners code their data separately and turn out to be doing it differently. If you have data like this and you have checked for and ruled out coding errors, you definitely need to see me.
  3. Look to see if the means of the INDEX get larger in the expected direction across categories of the OPEN-ENDED question, and that the analysis of variance test is significant at p < .01 or smaller. This should occur if the INDEX and the OPEN-ENDED question are both "valid" measures of your concept. If this does not happen, you try to figure out which measure is "bad." Keep in mind that the open-ended question might have a problem, or that the open-ended categories may be out of order.
  4. Tests of the "obvious hypothesis using the index will employ either correlation analysis (if your independent variable is continuous) or analyses of variance (if I.V. is categorical). Tests of the hypothesis using the open-ended question will use chi-squares. Look to see if the hypothesis is confirmed or not. You may get
    1. significant findings using both measures,
    2. nothing using either measure, or
    3. significant findings with one measure and not with the other.

Again, there are implications for the validity of each of these measures from these results. We will discuss how you interpret these in class.

The following are a set of tables, based on the student questionnaire presented a few pages back. In class you will be given the actual printouts from these data, just like the printouts you will later receive for your own data. After the tables, there is a guide to the output, titled "Numbers are our friends."

EXAMPLE TABLES for Univariate statistics. The computer printout provides a lot of statistics. The first few pages go into Tables 1 and 2.

Table 1. Frequency Distributions for Independent Variables
(I show only four items for brevity of presentation)

4sex31 freq party4 freq relig5 freq libcon9 freq
male 30 Repub 22 Catholic 18 Ext. Liberal 1
female 29 Dem 16 Jewish 1 2 10
    Prog 3 Protestant 14 3 10
    Ind 9 Buddhist 1 4 1
    Green 1 Hindu 1 5 6
    Libertar 1 Deist 2 6 9
    Other 7 None 19 7 5
        Other 3 8 5
            9 7
            Ext. Conserv. 1

Note: The computer prints only values of the variable that have been chosen by respondents in the frequency tables. You should put in all values because it helps you to compare the questions in terms of where people feel most favorable or unfavorable. Note that the tables do NOT report a mean for the open-ended question, or the independent variables, even though the computer printed them, because in most cases it would be meaningless.

Table 2. Frequency Distributions And Means for Dependent Variables.
(I present only the first four variables and the OPEN code for brevity)

deter11 f state12 f safe13 f cand14 f OPEN1 f
1 11 1 2 1 26 1 5 strongly opopose 22
2 19 2 8 2 6 2 11 mod. oppose 5
3 5 3 11 3 6 3 9 ambig. 8
4 8 4 7 4 5 4 6 mod. favor 10
5 12 5 19 5 9 5 9 strongly favor 14
6 4 6 12 6 7 6 19    
     
mean 3.05   4.17   2.76   4.02   (inapp.)


Reliability analysis; correlations among the closed-ended questions. The computer next printed out a triangular correlation matrix, which you copy into Table 3. (You may paste the matrix, as I have here, but you must provide a title.)

Table 3: Correlations among closed-ended questions and with the total scale

  deter11 safe13 cand14 retrib15 wrong17 cheap18
deter11 1.0000          
safe13 .613 1.0000        
cand14 .505 .648 1.0000      
retrib15 .533 .522 .609 1.0000    
wrong17 .470 .627 .560 .559 1.0000  
cheap18 .307 .516 .550 .561 .477 1.0000
fed19 .085 .249 .213 -.151 -.083 -.098
moral20 .669 .743 .714 .737 .692 .676
final21 .660 .623 .604 .710 .638 .523
apply22 .548 .511 .476 .380 .543 .262

 

  fed19 moral20 final21 apply22
fed19 1.0000      
moral20 .054 1.0000    
final21 -.023 .846 1.0000  
apply22 .158 .494 .546 1.0000

N of Cases= 59.0

Item-total Statistics

  Scale Mean If Item Deleted Scale Variance if Item Deleted Corrected Item-Total Correlation Squared Multiple Correlation Alpha if Item Deleted
deter11 29.1844 135.506 .680 .583 .892
safe13 29.4725 126.763 .784 .669 .884
cand14 28.2183 130.151 .758 .610 .886
retrib15 28.7683 128.557 .693 .632 .891
wrong17 28.6420 135.123 .698 .601 .891
cheap18 29.7268 138.933 .582 .547 .898
fed19 29.2692 159.576 .057 .332 .924
moral20 29.3878 124.042 .895 .865 .877
final21 29.0827 127.008 .808 .767 .883
apply22 29.3653 137.301 .592 .467 .897

 

Reliability Analysis - Scale (Alpha)

Reliability Coefficients 10 Items

Cronback's Alpha=.903, Standardized Item Alpha=.898


We will discuss in class how you would interpret these numbers and what are their implications for validity.

Bivariate association between open-ended question and index. The computer gives you the mean for the index separately for each category of the open-ended question.

Table 4. Mean of the Index for Each Category of the Open-ended Question.

OPEN1 N

Mean

Standard
Deviation

f p-value
Strongly Oppose 22 21.65 5.547    
Mod. Oppose 5 25.00 5.788 33.45 .001
Amabiguous 8 30.94 13.157    
Mod. Favor 10 37.10 6.280    
Strongly Favor 14 48.71 5.470    


Tests of "obvious" hypothesis and other hypotheses.

Table 5 a. Mean of Index Separately by Political Party Preference (Obvious hypothesis).

partycat Mean Index N Standard Dev. F p-value
Repub. or libertarian 41.55 23 10.93 8.872 .001
Democrat 27.00 16 12.23    
Prog. or green 18.58 4 3.47    
Independ 24.11 9 6.57    
Other 31.86 7 9.05    

 

Table 5 b. Mean of Index Separately by Religion

religcat Mean Index N Standard Dev. F p-value
Catholic 35.22 18 10.47 4.937 .004
Protestant 40.00 14 13.83    
None 25.47 19 10.93    
Deist, Jewish, Hindu, Buddhist, Other 28 8 11.90    


Table 6 a. Test of obvious hypothesis using open-ended question. Percentages of subjects in each category of the open-ended coding categories (collapsed) in relationship to political party (further collapsed).

  OPEN (grouped)

Party (trichotomy)

Codes 1-2 (Oppose)

Code 3 (Ambiguous)

Code 4-5 (Favor)

Repub. + Libertar (23) 13% (3) 43.5% (10) 43.5% (10)
Democrat(16) 56.3% (9) 25% (4) 45% (9)
other (20) 50% (10) 45% (9) 5% (1)
Chi-square; p-value 14.047; p< .007


Table 6 b. Test of second hypothesis, using open-ended answers. Percentage of respondents in each category of the open-ended question (collapsed) by categories of religion.

  OPEN (grouped)
Religion (four categories) Codes 1-2 (Oppose) Code 3 (Ambiguous) Codes 4-5 (Support)
Catholic 27.8% (5) 50% (9) 22.2% (4)
Protestant 28.6% (4) 14.3% (2) 57.1% (8)
None 42.1% (8) 52.6% (10) 5.3% (1)
Deist, Jewish, Hindu, Buddhist, other 62.5% (5) 25% (2) 12.5% (1)
Chi-square, p-value 16.024; p <.014

 

*** END OF EXAMPLE***


NUMBERS ARE OUR FRIENDS: A BRIEF STATISTICS GUIDE

Some statistical concepts:

  • Pearson product-moment correlations, analysis of variance, and chi-square are three of many ways to calculate the likelihood that patterns of relationships between variables are unlikely to be due to chance.
  • Correlations can run from +1.00 through 0 to -1.00. Correlations indicate the extent to which one variable is associated with a second. Alternatively, one can see it as the extent to which you can predict one variable from the other. The larger the number, the better the prediction. The significance of a correlation -- the likelihood that it represents a "real" relationship between the two variables -- can be calculated. It is based in part on the size of the correlation and in part upon the sample size.
  • Analysis of variance (and t-test, a special case involving only two sample groups) tests whether it is reasonable to assume that the sample means on a continuous variable for several different groups come from different populations. That is, can we conclude with some confidence that there is something about the groups that is "really" different, as opposed to having to accept that the apparent mean differences are due to chance.
  • Chi-square tests whether the distribution of cases in a contingency table (a cross-tabulations of two nominal or ordinal variables) deviates from what we would expect the distribution of cases in the table would be based on chance alone.
  • These tests of "significance" are based on ratios that are calculated and then looked up in tables. (Nowadays, the computer does all of this for you.) You need not understand the calculations to understand the concept. What the level of significance tells you is what proportion of times you would expect to find as strong a relationship as is indicated by the data simply by the operation of chance factors. The larger the obtained ratio, and the smaller the p-value, the less likely it is that only chance is operating, and, therefore, the more likely it is that the relationship represents some systematic association between the variables being tested.
  • We do not say that we have "proved" that a relationship exists, because we can only make probabilistic statements. The size of the p-value gives us only more or less confidence that the relationships we observe represent something meaningful.

Next Section

Top

Questions? Comments? Please contact jpiliavi@ssc.wisc.edu

Home

Vita

Sociology 236

Sociology 357

Sociology 647

Sociology 965

Sociology Homepage