{"id":1839,"date":"2015-03-05T07:52:55","date_gmt":"2015-03-05T13:52:55","guid":{"rendered":"http:\/\/www.ssc.wisc.edu\/~jfrees\/?page_id=1839"},"modified":"2015-03-21T11:49:17","modified_gmt":"2015-03-21T16:49:17","slug":"spring-2015-midterm","status":"publish","type":"page","link":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/exercises\/spring-2015-midterm\/","title":{"rendered":"Spring 2015 Midterm"},"content":{"rendered":"<p>A chief executive officer (CEO) is a leader of a firm or organization. The CEO leads by developing and implementing a strategic policy for the firm. The CEO is in charge of a management team that is responsible for the daily firm operations, financial strength and corporate social responsibilities.<\/p>\n<p>The CEO also leads the firm in compensation. Generally, a CEO is the most highly paid person in a firm; CEO salaries are at the top of the pyramid. Although some industries have employees whose salaries exceed the CEO&#8217;s, for example sales agents, the broad rule is that CEO salaries form an effective upper bound for employee compensation. Thus, although very few managers ever become chief executive officers, there is a great deal of interest in CEO salaries. CEO compensation indirectly influences salaries for a large portion of the firm workforce.<\/p>\n<p>CEO salaries in the United States are of interest because of their relationship to salaries in international firms and to salaries of people that do not belong to Corporate America. Top managers in the United States have come under a great deal of criticism for being so highly paid compared to their international counterparts. Yet, compensation of CEOs may not be out of line compared to top professionals in other fields. For example, Linden and Machan (1992, &#8220;Put Them at Risk!&#8221; <em>Forbes Magazine<\/em>, p. 158) compares CEO salaries with professionals such as actors, models, surgeons, sports personalities and so on, and finds the compensation comparable.<\/p>\n<p>Measuring annual compensation for a CEO is fraught with difficulties. Compensation clearly includes salary plus bonuses, that is, cash payments that may or may not be performance related. Other compensation is more difficult to measure and may include restricted stock awards and contributions to retirement, health insurance, and other employee benefit plans. Remuneration may also come in the form of stock gains based on the CEO&#8217;s stock ownership or exercise of stock options, although we did not consider this source of income.<\/p>\n<p>The data for this study were drawn from the May 25, 1992 issue of <em>Forbes Magazine<\/em> entitled &#8220;What 800 Companies Paid for their Bosses.&#8221; This article provides several measures of CEO compensation, as well as characteristics of the CEO and measures of his firm&#8217;s performance. We say &#8220;his&#8221; because of the 800 CEOs studied in this article, only one was a woman. The goal of this report is to study CEO and firm characteristics to determine the important factors influencing CEO compensation.<\/p>\n<p>To understand the determinants of CEO compensation, one hundred observations were randomly selected from the 800 listed in the <em>Forbes<\/em> article. Although the <em>Forbes<\/em> article did not cite the basis for a firm to be included in its survey, the 800 companies seem to represent the largest publicly traded companies in the United States. Our sample of one hundred CEOs and their firms represent a cross-sectional sample of America&#8217;s largest corporations. In our cross-section, the CEO and firm characteristics were based on 1991 measures.<\/p>\n<p>Table 1 provides variable definitions.<br \/>\n$$<br \/>\n{\\scriptsize<br \/>\n\\begin{matrix}{\\large \\text{Table 1. Variable Definitions} }\\\\<br \/>\n\\begin{array}{ll} \\hline<br \/>\nVariable &#038; Definitions \\\\<br \/>\n\\hline<br \/>\n\\text{COMP}    &#038;   \\text{Sum of salary, bonus and other 1991 compensation, in thousands of dollars.}    \\\\<br \/>\n        &#038;   ~~~\\text{Other compensation does not include stock gains}.   \\\\<br \/>\n\\text{AGE}     &#038;   \\text{CEOs age, in years}  \\\\<br \/>\n\\text{SALES}   &#038;   \\text{1991 sales revenues, in millions of dollars} \\\\<br \/>\n\\text{TENURE}  &#038;   \\text{Number of years employed by the firm}    \\\\<br \/>\n\\text{EXPER}   &#038;   \\text{Number of years as the firm CEO} \\\\<br \/>\n\\text{VAL}     &#038;   \\text{Market value of the CEOs stock, in thousands of dollars}   \\\\<br \/>\n\\text{PCTOWN}  &#038;   \\text{Percentage of firm&#8217;s market value owned by the CEO } \\\\<br \/>\n\\text{PROF}    &#038;   \\text{1991 profits of the firm, before taxes, in millions of dollars}  \\\\<br \/>\n\\text{EDUCATN} &#038;   \\text{Education level.} \\\\<br \/>\n               &#038;  0 \\text{   indicates that the CEO does not have an undergraduate degree} \\\\<br \/>\n               &#038;  1 \\text{   indicates that the CEO has only an undergraduate degree} \\\\<br \/>\n               &#038;  2 \\text{   indicates that the CEO has a graduate degree} \\\\<br \/>\n\\text{BACKGRD} &#038;  \\text{Categorical variable to professional background of the CEO}  \\\\ \\hline \\\\<br \/>\n\\end{array}<br \/>\n\\end{matrix}<br \/>\n}<br \/>\n$$<\/p>\n<p><strong>Part I. Preliminary Summarization. <\/strong><\/p>\n<p><strong>1<\/strong>. From a preliminary examination of the data, the 51st observation, had an unusually low compensation. This was Craig McCaw, CEO of McCaw Cellular, who reported a salary of $155,000 in 1991. This was despite a five-year total reported salary of over fifty-three million dollars. As founder of McCaw Cellular, Mr. McCaw received a substantial amount of remuneration outside of figures reported in 1991. Omit him from the sample.<br \/>\n<a id=\"displayCEOQuestionI.1\" href=\"javascript:toggle('toggleCEOQuestionI.1','displayCEOQuestionI.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionI.1\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\nCEO100 <- read.csv(choose.files(),header=TRUE)\r\n#fix(CEO100)\r\n#summary(CEO100)\r\n#I.1  REMOVE MCCAW\r\nCEO <- subset(CEO100,COMPANY!=\"mccaw\")\r\nattach(CEO)\r\n<\/pre>\n<\/div>\n<p><strong>2<\/strong>. Create the variables LOGCOMP, the natural logarithm of COMP, LOGSALES, the natural logarithm of SALES and LOGVAL, the natural logarithm of VAL.<\/p>\n<ul>\n<li>2a. Create histograms of COMP and LOGCOMP; compare the two distributions, commenting in particular on the effect that the logarithmic transformation has on the symmetry.<\/li>\n<li>2b. Do this also for SALES and VAL.<\/li>\n<\/ul>\n<p><a id=\"displayCEOQuestionI.2\" href=\"javascript:toggle('toggleCEOQuestionI.2','displayCEOQuestionI.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionI.2\" style=\"display: none\">\n<pre>\r\n<strong>2a.<\/strong> The COMPensation distribution is skewed to the right and fat-tailed. \r\nThe logarithmic transformation serves to symmetrize the distribution \r\nand ``pull in'' those individuals with large levels of compensation.\r\n<strong>2b.<\/strong> The same is true for SALES and VAL; \r\nboth distributions are right-skewed and fat-tailed. \r\nThe VAL distribution is very right-skewed.\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\nLOGCOMP  <- log(COMP)\r\nLOGSALES <- log(SALES)\r\nLOGVAL   <- log(VAL)\r\npar(mfrow=c(1, 2))\r\nhist(COMP,main=\"\");hist(LOGCOMP,nclass=10,main=\"\")\r\nhist(SALES,main=\"\");hist(LOGSALES,nclass=10,main=\"\")\r\nhist(VAL,main=\"\");hist(LOGVAL,nclass=10,main=\"\")\r\n<\/pre>\n<p><strong>R-Code Output<\/strong><br \/>\n<figure id=\"attachment_1867\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Histograms of Compensation and Logarithmic Compensation\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1a.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1a-300x219.png\" alt=\"Histograms of Compensation and Logarithmic Compensation\" width=\"300\" height=\"219\" class=\"size-medium wp-image-1867\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1a-300x219.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1a.png 565w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Histograms of Compensation and Logarithmic Compensation<\/figcaption><\/figure><br \/>\n<figure id=\"attachment_1868\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Histograms of Sales and Logarithmic Sales\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1c.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1c-300x219.png\" alt=\"Histograms of Sales and Logarithmic Sales\" width=\"300\" height=\"219\" class=\"size-medium wp-image-1868\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1c-300x219.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1c.png 565w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Histograms of Sales and Logarithmic Sales<\/figcaption><\/figure><br \/>\n<figure id=\"attachment_1866\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Histograms of Market Value and Logarithmic Market Value\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1b.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig1b-300x219.png\" alt=\"Histograms of Market Value and Logarithmic Market Value\" width=\"300\" height=\"219\" class=\"size-medium wp-image-1866\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1b-300x219.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig1b.png 565w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Histograms of Market Value and Logarithmic Market Value<\/figcaption><\/figure><\/p>\n<\/div>\n<p><strong>3<\/strong>. Compute summary statistics of the continuous variables COMP, LOGCOMP, AGE, SALES, LOGSALES, TENURE, EXPER, VAL, LOGVAL, PCTOWN, and PROF. Identify the median value of each variable.<br \/>\n<a id=\"displayCEOQuestionI.3\" href=\"javascript:toggle('toggleCEOQuestionI.3','displayCEOQuestionI.3');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionI.3\" style=\"display: none\">\n<pre>\r\nThe median values are given in the table below.\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\n#Summary Stats of Firm and CEO Variables\r\nXymat <- data.frame(cbind(COMP,LOGCOMP,SALES,LOGSALES,TENURE,EXPER,VAL,PROF))\r\nXymatA <- Xymat\r\nMean    <- sapply(XymatA, mean,  na.rm=TRUE)\r\nS.d.    <- sapply(XymatA, sd,    na.rm=TRUE)\r\nMinimum <- sapply(XymatA, min,   na.rm=TRUE)\r\nMaximum <- sapply(XymatA, max,   na.rm=TRUE)\r\nMedian  <- sapply(XymatA, median,na.rm=TRUE)\r\nsummvar <- cbind(Mean, Median, S.d., Minimum, Maximum)\r\nround(summvar,digits=3)\r\n<\/pre>\n<pre><strong>R-Code Output<\/strong>\r\n             Mean   Median     S.d.   Minimum   Maximum\r\nCOMP     1131.434  809.000  851.426   307.000  4657.000\r\nLOGCOMP     6.826    6.696    0.614     5.727     8.446\r\nSALES    4110.515 2344.000 4721.951   228.000 21351.000\r\nLOGSALES    7.809    7.760    1.003     5.429     9.969\r\nTENURE     23.768   27.000   12.491     1.000    46.000\r\nEXPER       8.929    6.000    8.308     0.500    35.000\r\nVAL        44.039    3.600  183.639     0.100  1689.000\r\nPROF      142.192   82.000  340.631 -1086.000  1618.000\r\n<\/pre>\n<\/div>\n<p><strong>Part II. Basic Linear Regression<\/strong>.<\/p>\n<p><strong>1<\/strong>. Plot SALES versus COMP and then LOGSALES versus LOGCOMP. Discuss the difficulties in modeling the relationship between SALES versus COMP that are not apparent in a relationship between LOGSALES and LOGCOMP.<br \/>\n<a id=\"displayCEOQuestionII.1\" href=\"javascript:toggle('toggleCEOQuestionII.1','displayCEOQuestionII.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionII.1\" style=\"display: none\">\n<pre>\r\nFrom the plot of SALES versus COMP, we see many CEO's in the \r\nlower left-hand corner of the plot a few with large SALES and large COMP. \r\nIt is difficult to discern an overall pattern. \r\nFrom the plot of LOGSALES versus LOGCOMP, the relationship is clearer. \r\nAs LOGSALES increases, so does LOGCOMP. \r\nThere is still a lot of variability in the plot but patterns are more clear.\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\npar(mfrow=c(1, 2))\r\nplot(SALES,COMP)\r\nplot(LOGSALES,LOGCOMP)\r\n<\/pre>\n<pre><strong>R-Code Output<\/strong>\r\n<figure id=\"attachment_1870\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Plots of Compensation vs Sales and Log Compensation vs Log Sales\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig2.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig2-300x219.png\" alt=\"Plots of Compensation vs Sales and Log Compensation vs Log Sales\" width=\"300\" height=\"219\" class=\"size-medium wp-image-1870\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig2-300x219.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig2.png 565w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Plots of Compensation vs Sales and Log Compensation vs Log Sales<\/figcaption><\/figure>\r\n<\/pre>\n<\/div>\n<p><strong>2<\/strong>. Compute correlations among the continuous variables COMP, LOGCOMP, AGE, SALES, LOGSALES, TENURE, EXPER, VAL, LOGVAL, PCTOWN, and PROF. Identify the variable (excluding LOGCOMP) that seems to have the strongest relationship with COMP. Also, identify the variable (excluding COMP) that seems to have the strongest relationship with LOGCOMP.<br \/>\n<a id=\"displayCEOQuestionII.2\" href=\"javascript:toggle('toggleCEOQuestionII.2','displayCEOQuestionII.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionII.2\" style=\"display: none\">\n<pre>\r\nThe variable LOGSALES has the highest correlation with \r\n  COMP and LOGCOMP.\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\nround(cor(XymatA),digits=3)\r\n<\/pre>\n<pre>\r\n<strong>R-Code Output<\/strong>\r\n          COMP LOGCOMP  SALES LOGSALES TENURE  EXPER    VAL  PROF\r\nCOMP     1.000   0.930  0.372    0.433  0.223  0.232  0.052 0.365\r\nLOGCOMP  0.930   1.000  0.399    0.496  0.236  0.216  0.025 0.331\r\nSALES    0.372   0.399  1.000    0.881  0.288 -0.071 -0.001 0.393\r\nLOGSALES 0.433   0.496  0.881    1.000  0.349 -0.062  0.060 0.346\r\nTENURE   0.223   0.236  0.288    0.349  1.000  0.390  0.064 0.287\r\nEXPER    0.232   0.216 -0.071   -0.062  0.390  1.000  0.295 0.082\r\nVAL      0.052   0.025 -0.001    0.060  0.064  0.295  1.000 0.124\r\nPROF     0.365   0.331  0.393    0.346  0.287  0.082  0.124 1.000\r\n<\/pre>\n<\/div>\n<p><strong>3<\/strong>. Fit a basic linear model, using LOGCOMP as the outcome of interest and LOGSALES as the explanatory variable.<\/p>\n<ul>\n<li>3a. Interpret the coefficient associated with LOGSALES as an elasticity.<\/li>\n<li>3b. Provide 90% and 99% confidence intervals for your answer in 3a.<\/li>\n<\/ul>\n<p><a id=\"displayCEOQuestionII.3\" href=\"javascript:toggle('toggleCEOQuestionII.3','displayCEOQuestionII.3');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionII.3\" style=\"display: none\">\n<pre>\r\n<strong>3a.<\/strong> For every percentage increase in SALES, \r\nwe expect COMPENSATION to increase by 0.303 percent.\r\n<strong>3b.<\/strong> A 90% confidence interval is (0.214, 0.393).\r\nA 99% confidence interval is (0.162, 0.445).\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\nmodelBLR <- lm(LOGCOMP ~ LOGSALES)\r\nsummary(modelBLR)\r\nconfint(modelBLR, level=.90)\r\nconfint(modelBLR, level=.99)\r\n<\/pre>\n<pre><strong>R-Code Output<\/strong>\r\n> summary(modelBLR)\r\nCoefficients:\r\n            Estimate Std. Error t value Pr(>|t|)\r\n(Intercept)  4.45625    0.42494  10.487  < 2e-16 ***\r\nLOGSALES     0.30344    0.05398   5.622 1.82e-07 ***\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 0.5359 on 97 degrees of freedom\r\nMultiple R-squared: 0.2457,     Adjusted R-squared: 0.238\r\nF-statistic:  31.6 on 1 and 97 DF,  p-value: 1.817e-07\r\n> confint(modelBLR, level=.90)\r\n                  5 %      95 %\r\n(Intercept) 3.7505515 5.1619585\r\nLOGSALES    0.2138002 0.3930821\r\n> confint(modelBLR, level=.99)\r\n                0.5 %    99.5 %\r\n(Intercept) 3.3397397 5.5727702\r\nLOGSALES    0.1616175 0.4452648\r\n<\/pre>\n<\/div>\n<p><strong>Part III. Multiple Linear Regression - I<\/strong>.<\/p>\n<p>1. Create a binary variable, PERCENT5, that indicates whether the CEO owns more than five percent of the firm's stock. Create another binary variable, GRAD, that indicates EDUCATN=2.<br \/>\n<a id=\"displayCEOQuestionIII.1\" href=\"javascript:toggle('toggleCEOQuestionIII.1','displayCEOQuestionIII.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionIII.1\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\nGRAD  <-  1*(EDUCATN==2)\r\nPERCENT5 <-  1*(PCTOWN >5)\r\n<\/pre>\n<\/div>\n<p>2. Run a regression model using LOGCOMP as the outcome of interest and four explanatory variables, LOGSALES, GRAD, PERCENT5, and EXPER.<\/p>\n<ul>\n<li>2a. Interpret the sign of the coefficient associated with GRAD. Comment also on the statistical significance of this variable.<\/li>\n<li>2b. For this model fit, is EXPER as statistical significant variable? To response to this question, use a formal test of hypothesis. State your null and alternative hypotheses, decision-making criterion, and decision-making rule. Use a 10% significance level.<\/li>\n<\/ul>\n<p><a id=\"displayCEOQuestionIII.2\" href=\"javascript:toggle('toggleCEOQuestionIII.2','displayCEOQuestionIII.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionIII.2\" style=\"display: none\">\n<pre>\r\n2a. The sign of the coefficient associated with GRAD indicates, \r\nother things being equal, that a CEO with a graduate degree has a \r\nlower level of compensation than his or her peers \r\n(with no or a only bachelor's degree). \r\nThe <em>t<\/em>-ratio indicates that is statistically significant, at the 1.7% level for \r\na two-sided test and at the 0.85% level for a one-sided test.\r\n\r\n2b. Test \\(H_0: \\beta_{EXPER} = 0\\) versus \\(H_a: \\beta_{EXPER} \\neq 0\\)\r\nat the 10% level of significance using a <em>t<\/em>-statistic.\r\nThe degrees of freedom is <em>df<\/em> =94. The corresponding <em>t<\/em>-value is\r\n1.661, using 10% significant level.\r\n\\begin{equation*}\r\nt-\\mathrm{ratio}=\\frac{\\mathrm{estimator-hypothesized~value~of~parameter}}\r\n{\\mathrm{standard~error~of~the~estimator}}=3.192\r\n\\end{equation*} Because \\(3.192 \\leq 1.661\\), we reject \\(H_0\\) in favor of the alternative. \r\nThat is, EXPER is statistically significant.\r\n<\/pre>\n<pre>\r\n<strong>R-Code<\/strong>\r\nmodel1 <- lm(LOGCOMP ~ LOGSALES+GRAD+PERCENT5+EXPER)\r\nsummary(model1)\r\n<\/pre>\n<pre>\r\n<strong>R-Code Output<\/strong>\r\n> summary(model1)\r\n\r\nCall:\r\nlm(formula = LOGCOMP ~ LOGSALES + GRAD + PERCENT5 + EXPER)\r\n\r\nCoefficients:\r\n             Estimate Std. Error t value Pr(>|t|)\r\n(Intercept)  4.746812   0.409735  11.585  < 2e-16 ***\r\nLOGSALES     0.279626   0.048584   5.756 1.08e-07 ***\r\nGRAD        -0.353974   0.104126  -3.399 0.000992 ***\r\nPERCENT5    -0.641284   0.175111  -3.662 0.000413 ***\r\nEXPER        0.019242   0.006028   3.192 0.001921 **\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 0.473 on 94 degrees of freedom\r\nMultiple R-squared: 0.4305,     Adjusted R-squared: 0.4062\r\nF-statistic: 17.76 on 4 and 94 DF,  p-value: 6.858e-11\r\n<\/pre>\n<\/div>\n<p><strong>Part III. Multiple Linear Regression - I<\/strong>.<\/p>\n<p>We run a regression model using LOGCOMP as the outcome of interest and four explanatory variables, LOGSALES, GRAD, PERCENT5, and EXPER. Correlations and the fitted regression model appear below.<\/p>\n<p>III.3<\/p>\n<ul>\n<li>a. Determine the partial correlation coefficient between EXPER and LOGCOMP, controlling for other explanatory variables.<\/li>\n<li>b. Compare the usual correlation coefficient between EXPER and LOGCOMP to the partial correlation calculated in part a. Contrast the different appearances that these coefficients provide and describe why differences may arise for this data set.<\/li>\n<\/ul>\n<pre>\r\n<strong>Table. Correlation Coefficients<\/strong>\r\n> round(cor(cbind(LOGCOMP,LOGSALES,GRAD,PERCENT5,EXPER,LOGVAL)),digits=3)\r\n         LOGCOMP LOGSALES   GRAD PERCENT5  EXPER LOGVAL\r\nLOGCOMP    1.000    0.496 -0.331   -0.181  0.216  0.366\r\nLOGSALES   0.496    1.000 -0.159   -0.034 -0.062  0.114\r\nGRAD      -0.331   -0.159  1.000   -0.256 -0.207 -0.402\r\nPERCENT5  -0.181   -0.034 -0.256    1.000  0.247  0.530\r\nEXPER      0.216   -0.062 -0.207    0.247  1.000  0.535\r\nLOGVAL     0.366    0.114 -0.402    0.530  0.535  1.000\r\n\r\n<strong>Fitted Regression Model<\/strong>\r\n> model2 <- lm(LOGCOMP ~ LOGSALES+GRAD+PERCENT5+EXPER+LOGVAL)\r\n> summary(model2)\r\n\r\nCall:\r\nlm(formula = LOGCOMP ~ LOGSALES + GRAD + PERCENT5 + EXPER + LOGVAL)\r\n\r\nResiduals:\r\n    Min      1Q  Median      3Q     Max \r\n-0.9347 -0.2800  0.0077  0.2019  1.2599 \r\n\r\nCoefficients:\r\n             Estimate Std. Error t value Pr(>|t|)    \r\n(Intercept)  4.916566   0.375611  13.090  < 2e-16 ***\r\nLOGSALES     0.246090   0.044939   5.476 3.68e-07 ***\r\nGRAD        -0.239013   0.098382  -2.429    0.017 *  \r\nPERCENT5    -1.011556   0.179882  -5.623 1.95e-07 ***\r\nEXPER        0.005557   0.006291   0.883    0.379    \r\nLOGVAL       0.132218   0.029558   4.473 2.18e-05 ***\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 0.4314 on 93 degrees of freedom\r\nMultiple R-squared:  0.5313,\tAdjusted R-squared:  0.5061 \r\nF-statistic: 21.09 on 5 and 93 DF,  p-value: 4.908e-14\r\n<\/pre>\n<p><a id=\"displayCEOQuestionIII.3\" href=\"javascript:toggle('toggleCEOQuestionIII.3','displayCEOQuestionIII.3');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionIII.3\" style=\"display: none\">\n<pre>\r\n<strong>3a.<\/strong> The partial correlation coefficient can be calculated using\r\n\\begin{eqnarray*}\r\nr(LOGCOMP,x_{EXPER}|other x's)&=&\\frac{t(b_{EXPER})}{\\sqrt{\r\nt(b_{EXPER})^{2}+n-(k+1)}}\\\\\r\n&=&\\frac{0.883}{\\sqrt{\r\n0.883^{2}+99-(6)}}\\\\\r\n&=& 0.091.\r\n\\end{eqnarray*}\r\n<strong>3b.<\/strong> The (ordinary) correlation between EXPER and LOGCOMP is 0.216\r\n whereas the partial correlation is 0.091. This suggests that when we control for other variables, \r\nsuch as LOGVAL, that the relationship between EXPER and LOGCOMP becomes weaker. \r\nIn particular, LOGVAL is strongly correlation with EXPER (0.535) and with LOGCOMP (0.366). \r\nThis variable may be inducing part of the strength of the relationship captured\r\n in the ordinary correlation coefficient.\r\n<\/pre>\n<\/div>\n<p><strong>Part IV. Multiple Linear Regression - II<\/strong>.<\/p>\n<p>Professional background (BACKGRD) of the CEO contains eleven categories, such as marketing, finance, accounting, insurance and so on. We use this factor to explain logarithmic compensation (LOGCOMP).<\/p>\n<p>IV.1. The number and mean effects of BACKGRD on LOGCOMP are described in the table below. A boxplot is given in Figure 1. Describe what we learn from the table and boxplot about the effect of BACKGRD on LOGCOMP.<\/p>\n<pre>\r\n> cbind(summarize(LOGCOMP,BACKGRD,length),\r\n+      round(summarize(LOGCOMP,BACKGRD,mean,na.rm=TRUE),digits=3))\r\n   BACKGRD LOGCOMP BACKGRD LOGCOMP\r\n       0       1       0   6.690\r\n       1      17       1   7.064\r\n       2       3       2   7.103\r\n       3      13       3   6.916\r\n       4      13       4   6.679\r\n       5      12       5   6.553\r\n       6       7       6   6.764\r\n       7      12       7   7.067\r\n       8       6       8   6.914\r\n       9      14       9   6.621\r\n      10       1      10   5.956\r\n> boxplot(LOGCOMP~EDUCATN,ylab=\"LOGCOMP\",xlab=\"EDUCATN\")\r\n<\/pre>\n<p><figure id=\"attachment_1895\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 1. Box plot of logarithmic compensation, by professional background\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig3a.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig3a-300x205.png\" alt=\"Box plot of logarithmic compensation, by professional background\" width=\"300\" height=\"205\" class=\"size-medium wp-image-1895\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig3a-300x205.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig3a.png 593w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 1. Box plot of logarithmic compensation, by professional background<\/figcaption><\/figure><br \/>\n<a id=\"displayCEOQuestionIV.1\" href=\"javascript:toggle('toggleCEOQuestionIV.1','displayCEOQuestionIV.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionIV.1\" style=\"display: none\">\n<pre>\r\nThe table and the boxplot suggest important differences in compensation \r\n   by level of professional background. \r\nBackground = 10 has the lowest level of compensation \r\n   (although only one person is in this category), \r\nBackground = 2 has the highest mean, Background = 8 has the highest median.\r\n<\/pre>\n<\/div>\n<p>IV.2. Consider a regression model using only the factor, BACKGRD; the fitted output is below.<\/p>\n<ul>\n<li>a. Provide an expression for the regression function for this model, defining each term.<\/li>\n<li>b. Provide an expression for the fitted regression function, using the fitted output. Further, give the fitted value for an observation with BACKGRD = 0 and with BACKGRD = 1, both in logarithmic units as well as dollars.<\/li>\n<li>c. Is BACKGRD a statistically significant determinant of LOGCOMP? State your null and alternative hypotheses, decision-making criterion, and your decision-making rules. (Hint: Use the <em>R<\/em><sup>2<\/sup> statistic to compute an <em>F<\/em>-statistic.)<\/li>\n<\/ul>\n<pre>\r\n> summary(lm(LOGCOMP ~ factor(BACKGRD)))\r\n\r\nCall:\r\nlm(formula = LOGCOMP ~ factor(BACKGRD))\r\n\r\nCoefficients:\r\n                  Estimate Std. Error t value Pr(>|t|)\r\n(Intercept)        6.68960    0.60605  11.038   <2e-16 ***\r\nfactor(BACKGRD)1   0.37441    0.62362   0.600    0.550\r\nfactor(BACKGRD)2   0.41388    0.69980   0.591    0.556\r\nfactor(BACKGRD)3   0.22644    0.62892   0.360    0.720\r\nfactor(BACKGRD)4  -0.01012    0.62892  -0.016    0.987\r\nfactor(BACKGRD)5  -0.13649    0.63079  -0.216    0.829\r\nfactor(BACKGRD)6   0.07449    0.64789   0.115    0.909\r\nfactor(BACKGRD)7   0.37771    0.63079   0.599    0.551\r\nfactor(BACKGRD)8   0.22431    0.65461   0.343    0.733\r\nfactor(BACKGRD)9  -0.06845    0.62732  -0.109    0.913\r\nfactor(BACKGRD)10 -0.73376    0.85708  -0.856    0.394\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 0.606 on 88 degrees of freedom\r\nMultiple R-squared: 0.1248,     Adjusted R-squared: 0.0253\r\n<\/pre>\n<p><a id=\"displayCEOQuestionIV.2\" href=\"javascript:toggle('toggleCEOQuestionIV.2','displayCEOQuestionIV.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionIV.2\" style=\"display: none\">\n<pre>\r\n<strong>2a.<\/strong> \\( \\mathrm{E~} LOGCOMP = \\beta_0 + \\beta_1 \\mathrm{I}(BACKGROUND=1) + \\cdots + + \\beta_{10} \\mathrm{I}(BACKGROUND=10)\\),\r\nwhere \\(\\mathrm{I}(BACKGROUND=j)\\) is a binary variable that indicates if the background is in type <em>j<\/em>, \r\n\\(j=1, \\ldots, 10\\). Here, the reference level is BACKGROUND=0.\r\n<strong>2b.<\/strong> \\( \\widehat{LOGCOMP} = b_0 + b_1 \\mathrm{I}(BACKGROUND=1) + \\cdots + + b_{10} \\mathrm{I}(BACKGROUND=10)\\),\r\nwhere \\(b_0 = 6.68960\\), \\(b_1=0.37441, \\ldots, b_{10} = -0.73376\\).\r\nFor BACKGRD = 0, the fitted log compensation is \\(b_0= 6.68960\\) log dollars, \r\n     or \\(\\exp(6.68960) = 804\\) thousands of dollars.\r\nFor BACKGRD = 1, the fitted log compensation is \\(b_0 + b_1 = 6.68960+ 0.37441 = 7.0640\\) log dollars, \r\n     or \\(\\exp(7.0640) = 1,169.11\\) thousands of dollars.\r\n<strong>2c.<\/strong> The null hypothesis is \\(H_0: \\beta_1 = \\beta_2 = \\cdots = \\beta_{10} = 0\\) . \r\n  The alternative hypothesis, \\(H_a\\) s that at least one of the \\(\\beta_j\\)'s is not zero.\r\nTo compute the test statistic, we have\r\n\\begin{equation*}F-ratio=\\frac{R^2}{1-R^2}\r\n\\frac{(n-(k+1))}{k} = \\frac{0.1248}{1-0.1248}\\frac{99-11}{10}=1.2548.\r\n\\end{equation*} We compare this to an <em>F<\/em>-distribution with degrees of freedom \\(df_1 =10\\) and \\(df_2=88\\). \r\n  From this distribution, we see that an approximate 95$th$ percentile is <em>F<\/em>-value = 1.95.\r\nThus, because <em>F<\/em>-ratio < <em>F<\/em>-value, we do not have enough evidence to reject the null hypothesis. \r\n  That is, BACKGRD is not a statistically significant determinant of compensation.\r\n<\/pre>\n<\/div>\n<p><strong>Part V. Variable Selection<\/strong>.<\/p>\n<p>V.1 We run a regression model using LOGCOMP as the outcome of interest and four explanatory variables, LOGSALES, GRAD, PERCENT5, and EXPER. In Figure 2 is a set of four diagnostic plots of this model.<\/p>\n<ul>\n<li>a. In the upper left-hand panel is a plot of residuals versus fitted values. What type of model misspecification does this type of plot help detect?<\/li>\n<li>b. Does the plot of residuals versus fitted values in Figure 2 reveal a serious model misspecification?<\/li>\n<li>c. In the upper right-hand panel is a normal <em>qq<\/em>-plot. Describe this plot and say what type of model misspecification it helps to detect.<\/li>\n<li>d. Does the normal <em>qq<\/em>-plot in Figure 2 reveal a serious model misspecification?<\/li>\n<li>e. In the lower right-hand panel is a plot of standardized residuals versus leverages. Describe this plot and say what type of model misspecification it helps to detect.<\/li>\n<li>f. Observation 87 appears in Figure 2. Is it a high leverage point? Describe the average leverage for this data set and give a rule of thumb cut-off for a point to be a high leverage point.<\/li>\n<li>g. Observation 87 appears in Figure 2. Is it an outlier? Give a rule of thumb cut-off for a point to be an outlier.<\/li>\n<\/ul>\n<p><figure id=\"attachment_1899\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 2. Diagnostic Plots of a Model of Logarithmic Compensation\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig4.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig4-300x205.png\" alt=\"Figure 2. Diagnostic Plots of a Model of Logarithmic Compensation\" width=\"300\" height=\"205\" class=\"size-medium wp-image-1899\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig4-300x205.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig4.png 593w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 2. Diagnostic Plots of a Model of Logarithmic Compensation<\/figcaption><\/figure><br \/>\n<a id=\"displayCEOQuestionV.1\" href=\"javascript:toggle('toggleCEOQuestionV.1','displayCEOQuestionV.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionV.1\" style=\"display: none\">\n<pre>\r\n<strong>a.<\/strong> A plot of residuals versus fitted values helps to detect heteroscedasticity.\r\n<strong>b.<\/strong> No, this plot reveals no serious heteroscedasticity issues. \r\nRecall that we have taken the logarithmic transformation of COMPENSATION already \r\n- this type of transform often mitigates heteroscedasticity problems.\r\n<strong>c.<\/strong> This is a normal qq-plot based on model residuals. \r\nThe vertical axis gives the actual standardized residuals.\r\nThe horizontal axis calculates the corresponding residuals under the normal assumption. \r\nThis plot detects deviations from the assumption of normality \r\n- we can also get information about outlying observations.\r\n<strong>d.<\/strong> This plot reveals that the approximate normality assumption \r\nis reasonable for most of the distribution. \r\nSome of the smallest and largest observations are not what we would expect to see under the normality assumption.\r\n<strong>e.<\/strong> The distribution of standardize residuals helps us identify outlying observations. \r\nThe distribution of leverages helps us to identify high leverage points. \r\nThe plot helps us to identify their joint effect.\r\n<strong>f.<\/strong> Observation 87 is marked in the upper-left, upper-right and lower-left panels. \r\nIt is not a high leverage point. The leverage for this observation is about 0.02. \r\nFor this data set, the average leverage is \\((k+1)\/n = 5\/ 99 =.05\\). \r\nThus, the leverage for observation 87 is less than the mean. \r\nThe usual cut-off for high leverage is 3 times the mean, or about 0.15 for this data set.\r\n<strong>g.<\/strong> The standardized residual for observation 87 is above 3 (about 3.3). \r\nOne cut-off for an observation to be an outlier is 2 (another is 3). \r\nObservation 87 exceeds both cut-offs and would typically be labeled as an outlier.\r\n\r\n<\/pre>\n<\/div>\n<p>V.2 We run a regression model using LOGCOMP as the outcome of interest and four explanatory variables, LOGSALES, GRAD, PERCENT5, and EXPER. In Figure 3 is a plot of LOGVAL versus the residuals from this model. The correlation between these two variables is 0.292.<\/p>\n<ul>\n<li>a. What do we hope to learn from a plot of a potential explanatory variable versus residuals from a model fit?<\/li>\n<li>b. What new model does the information in Figure 3 suggest that we specify?<\/li>\n<\/ul>\n<p><figure id=\"attachment_1900\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 3. Plot of LOGVAL versus Standardized Residuals from a Model of Logarithmic Compensation\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig5.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/03\/CEOFig5-300x205.png\" alt=\"Figure 3. Plot of LOGVAL versus Standarized Residuals from a Model of Logarithmic Compensation\" width=\"300\" height=\"205\" class=\"size-medium wp-image-1900\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig5-300x205.png 300w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/03\/CEOFig5.png 593w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 3. Plot of LOGVAL versus Standardized Residuals from a Model of Logarithmic Compensation<\/figcaption><\/figure><br \/>\n<a id=\"displayCEOQuestionV.2\" href=\"javascript:toggle('toggleCEOQuestionV.2','displayCEOQuestionV.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionV.2\" style=\"display: none\">\n<pre>\r\n<strong>a.<\/strong> Plots of potential explanatory variable versus residuals \r\nfrom a model fit provide a suggest for incorporating explanatory variables into \r\nthe model specification. We think of residuals from a model fit as the <em>y<\/em> \r\nvalue after we have extracted, or ``controlled for,'' the <em>x<\/em> values. \r\nIf there is a strong relationship between the residuals and an explanatory variable, \r\nthen this suggests a pattern that we might use in developing our model fit.\r\n<strong>b.<\/strong> The positive correlation and the approximate linear relation \r\nin the plot suggest incorporating LOGVAL linearly into the model.\r\n<\/pre>\n<\/div>\n<p><strong>Part VI. Some Algebra Problems<\/strong>.<\/p>\n<p>VI.1 <strong>Regression through the origin<\/strong>. Consider the model \\(y_i=\\beta_1 z_i^2 + \\varepsilon_i\\), a quadratic model passing through the origin.<\/p>\n<ul>\n<li>a. Determine the least squares estimate of \\(\\beta_1\\).<\/li>\n<li>b. Using the following set of <em>n<\/em>=5 observations, given a numerical result for the least squares estimate of \\(\\beta_1\\) determined in part (a).<\/li>\n<\/ul>\n<p>\\begin{equation*}<br \/>\n\\begin{array}{l|rrrrr}<br \/>\n\\hline<br \/>\ni & 1 & 2 & 3 & 4 & 5 \\\\<br \/>\nz_i & -2 & -1 & 0 & 1 & 2 \\\\<br \/>\ny_i & 4 & 0 & 0 & 1 & 4 \\\\ \\hline<br \/>\n\\end{array}<br \/>\n\\end{equation*}<br \/>\n<a id=\"displayCEOQuestionVI.1\" href=\"javascript:toggle('toggleCEOQuestionVI.1','displayCEOQuestionVI.1');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionVI.1\" style=\"display: none\">\n<pre>\r\n<strong>a<\/strong>. The error sum of squares is\r\n\\begin{equation*}\r\n\\mathrm{SS}(b_1^{\\ast} )=\\sum_{i=1}^n\\left( y_i - b_1^{\\ast }z_i^2\\right) ^{2}\r\n\\end{equation*} Taking derivatives, we have:\r\n\\begin{equation*}\r\n\\frac{\\partial }{\\partial b_1^{\\ast }}SS(b_1^{\\ast})=\\sum_{i=1}^n(-2z_{i}^2)\\left( y_{i}-b_1^{\\ast }z_{i}^2\\right)=0\r\n\\end{equation*} Setting this equal to zero yields\r\n\\begin{equation*}\r\n\\sum_{i=1}^n\\left(b_1^{\\ast }z_{i}^4-z_{i}^2 y_{i}\\right) =0\r\n\\end{equation*} Solving for \\(b_1\\) gives our result.\r\n\\begin{equation*}\r\nb_1 = \\frac{\\sum_{i=1}^n z_i^2 y_i}{\\sum_{i=1}^nz_i^{4}}.\r\n\\end{equation*}\r\n<strong>b.<\/strong> Plugging in, we have\r\n\\begin{eqnarray*}\r\nb_1 &=& \\frac{\\sum_{i=1}^n z_i^2 y_i}{\\sum_{i=1}^nz_i^{4}} \\\\\r\n&=& \\frac{(-2)^2 (4)+(-1)^2(0)+(0)^2(0)+(1)^2(1)+(2)^2(4) }\r\n{(-2)^4+(-1)^4+(0)^4+(1)^4+(2)^4}\r\n= \\frac{33}{34}\r\n\\end{eqnarray*}\r\n<\/pre>\n<\/div>\n<p>VI.2  You are doing regression with one explanatory variable and so consider the basic linear regression model \\(y_i = \\beta_0 +  \\beta_1 x_i + \\varepsilon_i\\).<\/p>\n<ul>\n<li>a.  Show that the <em>i<\/em>th leverage can be simplified to<br \/>\n\\begin{equation*}<br \/>\nh_{ii} = \\frac{1}{n} + \\frac{(x_i - \\overline{x})^2}{(n-1) s_x^2}.<br \/>\n\\end{equation*}<\/li>\n<li>b.  Show that  \\(\\overline{h}= 2 \/ n\\).<\/li>\n<li>c.  Suppose that \\(h_{ii} = 6\/n\\) . How many standard deviations is \\(x_i\\) away (either above or below) from the mean?<\/li>\n<\/ul>\n<p><a id=\"displayCEOQuestionVI.2\" href=\"javascript:toggle('toggleCEOQuestionVI.2','displayCEOQuestionVI.2');\"><i><strong>Solution<\/strong><\/i><\/a> <\/p>\n<div id=\"toggleCEOQuestionVI.2\" style=\"display: none\">\n<pre>\r\n<strong>a.<\/strong> \\begin{equation*}\r\n\\mathbf{x_i}=(1,x_i)' , ~~~\\mathbf{X}^{\\prime }\\mathbf{X=}\\left(\\begin{array}{ccc}1 & ... & 1 \\\\x_1 & ... & x_n\\end{array}\\right)\\left(\\begin{array}{cc}1 & x_1 \\\\... & ... \\\\1 & x_n\\end{array}\\right)=\\left(\\begin{array}{cc}n & \\sum_{i=1}^{n}x_i \\\\\\sum_{i=1}^{n}x_i & \\sum_{i=1}^{n}x_i^2\\end{array}\\right)\\\\\r\n\\end{equation*}\r\nand \\begin{equation*}\\left( \\mathbf{X}^{\\prime }\\mathbf{X}\\right)^{-1}\\mathbf{=}\\frac{1}{ \\sum_{i=1}^{n}x_i^2-n\\overline{x}^2}\\left(\\begin{array}{cc}n^{-1}\\sum_{i=1}^{n}x_i^2 & -\\overline{x} \\\\-\\overline{x} & 1\\end{array}\\right) .\r\n\\end{equation*}\r\n\\begin{eqnarray*}\r\nh_{ii} & = & \\mathbf{x_i}^{\\prime}\\left(\\mathbf{X}^{\\prime }\\mathbf{X}\\right)^{-1}\\mathbf{x_i}\\\\\r\n&=&\\left(\\begin{array}{cc}1 & x_i\\end{array}\\right)\\frac{1}{ \\sum_{i=1}^{n}x_i^2-n\\overline{x}^2}\\left(\\begin{array}{cc}n^{-1}\\sum_{i=1}^{n}x_i^2 & -\\overline{x} \\\\-\\overline{x} & 1\\end{array}\\right)\\left(\\begin{array}{c}1 \\\\x_i\\end{array}\\right)\\\\\r\n&=&\\frac{1}{ \\sum_{i=1}^{n}x_i^2-n\\overline{x}^2}\\left(\\begin{array}{cc}n^{-1}\\sum_{i=1}^{n}x_i^2-\\overline{x}x_i & -\\overline{x}+x_i\\end{array}\\right)\\left(\\begin{array}{c}1 \\\\x_i\\end{array}\\right)\\\\\r\n&=&\\frac{n^{-1}\\sum_{i=1}^{n}x_i^2-\\overline{x}x_i-\\overline{x}x_i+x_i^2}{\\sum_{i=1}^{n}x_i^2-n\\overline{x}^2}\\\\\r\n&=&\\frac{n^{-1}(\\sum_{i=1}^{n}x_i^2-n\\overline{x}^2)+\\overline{x}^2-2\\overline{x}x_i+x_i^2}{\\sum_{i=1}^{n}x_i^2-n\\overline{x}^2}\\\\\r\n&=&\\frac{1}{n}+\\frac{(x_i-\\overline{x})^2}{\\sum_{i=1}^{n}(x_i-\\overline{x})^2}\\\\\r\n&=&\\frac{1}{n}+\\frac{(x_i-\\overline{x})^2}{(n-1)s_x^2}\r\n\\end{eqnarray*}\r\n<strong>b.<\/strong>    \\begin{equation*} \\bar{h}=\\frac{\\sum_{i=1}^n h_{ii}}{n}=\\frac{1}{n}+\\frac{1}{n}\\sum_{i=1}^n\\frac{(x_i-\\bar{x})^2}{(n-1)s_{x}^2}=\\frac{1}{n}+\\frac{1}{n}=\\frac{2}{n}\r\n\\end{equation*}\r\n<strong>c.<\/strong> Let \\(c=(x_i-\\bar{x})\/s_x\\)\r\n\\begin{equation*} \\frac{6}{n}=h_{ii}=\\frac{1}{n}+\\frac{(x_i-\\bar{x})^2}{(n-1)s_{x}^2}=\\frac{1}{n}+\\frac{(c s_x)^2}{(n-1)s_{x}^2}=\\frac{1}{n}+\\frac{c^2}{n-1}\r\n\\end{equation*}\r\n\\begin{equation*} c=\\sqrt{5-\\frac{5}{n}}\r\n\\end{equation*}\r\nFor large <em>n<\/em>, \\(x_i\\) is approximately \\(c=\\sqrt{5}=2.236\\) \r\n standard deviations from away from the mean.\r\n<\/pre>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A chief executive officer (CEO) is a leader of a firm or organization. The CEO leads by developing and implementing a strategic policy for the firm. The CEO is in charge of a management team &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":1864,"menu_order":8,"comment_status":"closed","ping_status":"open","template":"","meta":{"jetpack_post_was_ever_published":false},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P8cLPd-tF","acf":[],"_links":{"self":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/1839"}],"collection":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/comments?post=1839"}],"version-history":[{"count":67,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/1839\/revisions"}],"predecessor-version":[{"id":2304,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/1839\/revisions\/2304"}],"up":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/1864"}],"wp:attachment":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/media?parent=1839"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}