{"id":3304,"date":"2015-04-11T23:03:24","date_gmt":"2015-04-12T04:03:24","guid":{"rendered":"http:\/\/www.ssc.wisc.edu\/~jfrees\/?page_id=3304"},"modified":"2015-08-21T13:37:49","modified_gmt":"2015-08-21T18:37:49","slug":"2-3-basic-linear-regression-model","status":"publish","type":"page","link":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-3-basic-linear-regression-model\/","title":{"rendered":"2.2 Basic Linear Regression Model"},"content":{"rendered":"<div class=\"scbb-content-box scbb-content-box-gray\">In this section, you learn how to: \n<ul>\n<li>Explain assumptions for the observables representation of the model<\/li>\n<li>Explain assumptions for the error representation of the model<\/li>\n<li>Contrast a statistic with a parameter<\/li>\n<\/ul>\n<h2 style=\"text-align: center\"><a href=\"http:\/\/flash.bus.wisc.edu\/data\/act_sci\/Frees\/Regression2015\/Chapter2\/Part2\/BasicLinearRegrModel.html\" target=\"_blank\">Video Overview of the Section <\/a><a href=\"http:\/\/flash.bus.wisc.edu\/data\/act_sci\/Frees\/Regression2015\/Chapter2\/Part2\/BasicLinearRegrModel.mp4\" target=\"_blank\">(<em>Alternative .mp4 Version &#8211; 5:35 min<\/em>)<\/a><\/h2>\n<p><\/p><\/div>\n<p>The scatter plot, correlation coefficient and the fitted regression line are useful devices for summarizing the relationship between two variables for a specific data set. To infer general relationships, we need models to represent outcomes of broad populations. <\/p>\n<p>This chapter focuses on a &#8220;basic linear regression&#8221; model. The &#8220;linear regression&#8221; part comes from the fact that we fit a line to the data. The &#8220;basic&#8221; part is because we use only one explanatory variable, <em>x<\/em>. This model is also known as a &#8220;simple&#8221; linear regression. This text avoids this language because it gives the false impression that regression ideas and interpretations with one explanatory variable are always straightforward. <\/p>\n<p> We now introduce two sets of assumptions of the basic model, the &#8220;observables&#8221; and the &#8220;error&#8221; representations. They are equivalent but each will help us as we later extend regression models beyond the basics.<br \/>\n<div class=\"scbb-content-box scbb-content-box-gray\">\\begin{matrix}<br \/>\n\\begin{array}{c} \\hline \\text{Basic Linear Regression Model} \\\\ \\text{Observables Representation Sampling Assumptions}<br \/>\n\\end{array} \\\\<br \/>\n\\begin{array}{ll} \\hline {F1.}&#038;{ \\mathrm{E}~y_i=\\beta_0 + \\beta_1 x_i .} \\\\<br \/>\n {F2.}&#038;{ \\{x_1,\\ldots ,x_n\\} \\text{  are non-stochastic variables.}} \\phantom{XXX}\\\\ {F3.}&#038;{ \\mathrm{Var}~y_i=\\sigma ^{2}.} \\\\ {F4.}&#038;{ \\{y_i\\} \\text{  are independent random variables.}} \\\\ \\hline \\end{array}\\end{matrix}<br \/>\n<\/div><br \/>\n The &#8220;observables representation&#8221; focuses on variables that we can see (or observe), \\((x_i,y_i)\\). Inference about the distribution of <em>y<\/em> is conditional on the observed explanatory variables, so that we may treat \\(\\{x_1,\\ldots ,x_n\\}\\) as non-stochastic variables (assumption F2). When considering types of sampling mechanisms for \\((x_i,y_i)\\), it is convenient to think of a <em>stratified random sampling<\/em> scheme, where values of \\(\\{x_1,\\ldots ,x_n\\}\\) are treated as the strata, or group. Under stratified sampling, for each unique value of \\(x_i\\), we draw a random sample from a population. To illustrate, suppose you are drawing from a database of firms to understand stock return performance (<em>y<\/em>) and wish to stratify based on the size of the firm. If the amount of assets is a continuous variable, then we can imagine drawing a sample of size 1 for each firm. In this way, we hypothesize a distribution of stock returns conditional on firm asset size. <\/p>\n<p><em>Digression<\/em>: You will often see reports that summarize results for the &#8220;top 50 managers&#8221; or the &#8220;best 100 universities,&#8221; measured by some outcome variable. In regression applications, make sure that you do not select observations based on a dependent variable, such as the highest stock return, because this is stratifying based on the <em>y<\/em>, not the \\(x\\). <a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/?p=2656\">Section 6.3<\/a> will discuss sampling procedures in greater detail. <\/p>\n<p> Stratified sampling also provides motivation for assumption F4, the independence among responses. One can motivate assumption F1 by thinking of \\((x_i,y_i)\\) as a draw from a population, where the mean of the conditional distribution of \\(y_i\\) given {\\(x_i\\)} is linear in the explanatory variable. Assumption F3 is known as <em>homoscedasticity<\/em> that we will discuss extensively in <a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/?p=3574\">Section 5.7<\/a>. See Goldberger (1991) for additional background on this representation. <\/p>\n<p> A fifth assumption that is often implicitly used is:<br \/>\n<div class=\"scbb-content-box scbb-content-box-gray\">F5. {\\(y_i\\)} are normally distributed.<\/div><\/p>\n<p>  This assumption is not required for many statistical inference procedures because central limit theorems provide approximate normality for many statistics of interest. However, formal justification for some, such as <em>t<\/em>-statistics, do require this additional assumption. <\/p>\n<p> In contrast to the observables representation, an alternative set of assumptions focuses on the deviations, or &#8220;errors,&#8221; in the regression, defined as \\(\\varepsilon_i=y_i-\\left( \\beta_0 + \\beta_1 x_i \\right) \\).<br \/>\n<div class=\"scbb-content-box scbb-content-box-gray\"> \\begin{matrix}<br \/>\n\\begin{array}{c} \\hline \\text{Basic Linear Regression Model} \\\\ \\text{Error Representation Sampling Assumptions}<br \/>\n\\end{array} \\\\<br \/>\n\\begin{array}{ll}\\hline {E1.}&#038;{ y_i=\\beta_0+\\beta_1 x_i + \\varepsilon _i.} \\\\ {E2.}&#038;{ \\{x_1,\\ldots ,x_n\\} \\text{are non-stochastic variables.}} \\\\ {E3.}&#038;{ \\mathrm{E}~\\varepsilon _i=0 \\text{and} \\mathrm{Var}~\\varepsilon _i=\\sigma ^{2}.} \\\\ {E4.}&#038;{ \\{\\varepsilon _i\\} \\text{are independent random variables.}} \\\\ \\hline \\end{array}<br \/>\n\\end{matrix}<\/div> <\/p>\n<p> The &#8220;error representation&#8221; is based on the Gaussian theory of errors (see Stigler, 1986, for a historical background). Assumption E1 assumes that <em>y<\/em> is in part due to a linear function of the observed explanatory variable, <em>x<\/em>. Other unobserved variables that influence the measurement of <em>y<\/em> are interpreted to be included in the &#8220;error&#8221; term \\(\\varepsilon _i\\), which is also known as the &#8220;disturbance&#8221; term. The independence of errors, E4, can be motivated by assuming that {\\(\\varepsilon _i\\)} are realized through a simple random sample from an unknown population of errors. <\/p>\n<p> Assumptions E1-E4 are equivalent to F1-F4. The error representation provides a useful springboard for motivating goodness of fit measures (Section 2.3). However, a drawback of the error representation is that it draws the attention from the observable quantities \\((x_i,y_i)\\) to an unobservable quantity, {\\(\\varepsilon _i\\)}. To illustrate, the sampling basis, viewing {\\(\\varepsilon _i\\)} as a simple random sample, is not directly verifiable because one cannot directly observe the sample {\\( \\varepsilon _i\\)}. Moreover, the assumption of additive errors in E1 will be troublesome when we consider nonlinear regression models. <\/p>\n<p> Figure 2.3 illustrates some of the assumptions of the basic linear regression model. The data (\\(x_1,y_1\\)), (\\(x_2,y_2\\)) and (\\(x_3,y_3\\)) are observed and are represented by the circular opaque plotting symbols. According to the model, these observations should be close to the regression line \\(\\mathrm{E}~y = \\beta_0 + \\beta_1 x\\). Each deviation from the line is random. We will often assume that the distribution of deviations may be represented by a normal curve, as in Figure 2.3. <\/p>\n<figure class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 2.3 The distribution of the response varies by the level of the explanatory variable.&lt;br \/&gt;\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2NormalCurve.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2NormalCurve.png\" alt=\"F2NormalCurve\" width=\"576\" height=\"288\" class=\"aligncenter size-full wp-image-3260\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2NormalCurve.png 576w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2NormalCurve-300x150.png 300w\" sizes=\"(max-width: 576px) 100vw, 576px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 2.3 The distribution of the response varies by the level of the explanatory variable.<br \/><\/figcaption><\/figure>\n<h2 style=\"text-align: center;\"><a id=\"displayText2.3f\" href=\"javascript:togglecode('toggleText2.3f','displayText2.3f');\"><i><strong>R Code for Figure 2.3<\/strong><\/i><\/a> <\/h2>\n<div id=\"toggleText2.3f\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\n#  FIGURE 2.4\r\npar(mar=c(2.1,.2,.2,.2),cex=1.2)\r\nx &lt;- seq(-2.5, 2.5, by = 0.01)\r\ny &lt;- dnorm(x, sd=0.8)\r\nplot(y, x, xlim=c(0, 3), ylim=c(-3, 5), type=\"l\", xaxt=\"n\", yaxt=\"n\", xlab=\"\", ylab=\"\")\r\nlines(y+1, x+1)\r\nlines(y+2, x+2)\r\naxis(1, c(0, 1, 2), labels=c(expression(x[1]), expression(x[2]),expression(x[3])), cex=1.2)\r\nabline(0, 1)\r\n\r\nsegments(0, -3, 0, 5, lty=2)\r\nsegments(1, -3, 1, 5, lty=2)\r\nsegments(2, -3, 2, 5, lty=2)\r\n\r\npoints(0, 0.3, pch=19)\r\npoints(1, 0.5, pch=19)\r\npoints(2, 1.8, pch=19)\r\n\r\narrows(0.5, 3, 0.6, 0.6, code=2, angle=10, length=0.2)\r\ntext(0.6, 3.5, \"True Unknown \\n Regression Line\", cex=.9)\r\narrows(1.5, -1.6, 1.2, -0.1, code=2, angle=10, length=0.2)\r\ntext(1.5, -2.3, \"Each Response Tends \\n To Fall Near The Height Of \\n The Regression Line\", cex=.9)\r\ntext(2.6, -.2, \"The Center Of Each Normal \\n Curve Is At The Height OF \\n The Regression Line\", cex=.9)\r\n<\/pre>\n<\/div>\n<p> The basic linear regression model assumptions describe the underlying population. Table 2.2 highlights the idea that characteristics of this population can be summarized by the parameters \\(\\beta_0\\), \\(\\beta_1\\) and \\(\\sigma ^{2}\\). In Section 2.1, we summarized data from a sample, introducing the statistics \\(b_0\\) and \\(b_1\\). Section 2.3 will introduce \\(s^{2}\\), the statistic corresponding to the parameter \\(\\sigma ^{2}\\). <\/p>\n<div class=\"scbb-content-box scbb-content-box-gray\"> \\begin{matrix}<br \/>\n\\begin{array}{c}<br \/>\n\\text{Table 2.2 Summary Measures of the Population and Sample}\\\\<br \/>\n\\end{array}\\\\<br \/>\n \\begin{array}{cccc} \\hline \\text{Data} &#038; \\phantom{XXX}\\text{Summary} &#038; \\phantom{XX}\\text{Regression} &#038; \\text{Variance} \\\\ &#038; \\phantom{XXX}\\text{Measures} &#038; \\phantom{XXX}\\text{Line} &#038;  \\end{array}\\\\<br \/>\n\\begin{array}{ccccc}&#038; &#038; \\text{Intercept} &#038; \\text{Slope} &#038;  \\\\ \\hline \\text{Population} &#038; \\text{Parameters} &#038; \\beta_0 &#038; \\beta_1 &#038; \\sigma ^{2} \\\\ \\text{Sample} &#038; \\text{Statistics} &#038; b_0 &#038; b_1 &#038; s^2 \\\\ \\hline \\end{array} \\end{matrix} <\/div>\n<div class=\"scbb-content-box scbb-content-box-gray\">[WpProQuiz 10]<\/div>\n<p><div class=\"alignleft\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-1-correlations-and-least-squares\/method-of-least-squares\/\" title=\"Method of Least Squares\">&#9668 Previous page<\/a><\/div><div class=\"alignright\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-3-is-the-model-useful-some-basic-summary-measures\/\" title=\"2.3 Is the Model Useful? Some Basic Summary Measures\">Next page &#9658<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The scatter plot, correlation coefficient and the fitted regression line are useful devices for summarizing the relationship between two variables for a specific data set. To infer general relationships, we need models to represent outcomes &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":3243,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P8cLPd-Ri","acf":[],"_links":{"self":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3304"}],"collection":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/comments?post=3304"}],"version-history":[{"count":33,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3304\/revisions"}],"predecessor-version":[{"id":5025,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3304\/revisions\/5025"}],"up":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3243"}],"wp:attachment":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/media?parent=3304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}