{"id":4794,"date":"2015-08-16T09:34:24","date_gmt":"2015-08-16T14:34:24","guid":{"rendered":"http:\/\/www.ssc.wisc.edu\/~jfrees\/?page_id=4794"},"modified":"2015-08-18T13:31:46","modified_gmt":"2015-08-18T18:31:46","slug":"example-lottery-sales-continued","status":"publish","type":"page","link":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-6-building-a-better-model-residual-analysis\/example-lottery-sales-continued\/","title":{"rendered":"Example: Lottery Sales &#8212; Continued"},"content":{"rendered":"<p>Figure 2.7 exhibits an outlier; the point in the upper left-hand side of the plot represents a zip code that includes Kenosha, Wisconsin. Sales for this zip code are unusually high given its population. Kenosha is close to the Illinois border; residents from Illinois probably participate in the Wisconsin lottery thus effectively increasing the potential pool of sales in Kenosha. Table 2.7 summarizes the regression fit both with and without this zip code. <\/p>\n<p> \\begin{matrix}<br \/>\n\\begin{array}{c}<br \/>\n\\text{Table 2.7 Regression Results with and without Kenosha}<br \/>\n\\end{array}\\\\\\small<br \/>\n \\begin{array}{l|rrrrr}  \\hline \\text{Data} &#038; b_0 &#038; b_1 &#038; s &#038; R^2(\\%) &#038; t(b_1) \\\\ \\hline \\text{With Kenosha} &#038; 469.7 &#038; 0.647 &#038; 3,792 &#038; 78.5 &#038; 13.26 \\\\ \\text{Without Kenosha} &#038; -43.5 &#038; 0.662 &#038; 2,728 &#038; 88.3 &#038; 18.82 \\\\ \\hline \\end{array}<br \/>\n \\end{matrix}<br \/>\n\r\n<h2 style=\"text-align: center;\"><a id=\"displayTextf8.3\" href=\"javascript:togglecode('toggleTextf8.3','displayTextf8.3');\"><i><strong>See R Code in Action<\/strong><\/i><\/a><\/h2><div class=\"sage-r\" id=\"toggleTextf8.3\" style=\"display: block\"><script type=\"text\/x-sage\">\r\nLot <- read.csv(\"http:\/\/instruction.bus.wisc.edu\/jfrees\/jfreesbooks\/Regression%20Modeling\/BookWebDec2010\/CSVData\/WiscLottery.csv\",header=TRUE)\r\nmodel.basiclinearreg <-lm(SALES ~ POP, Lot)\r\nsummary(model.basiclinearreg)\r\nmodel.Kenosha <- lm(SALES ~ POP, Lot, subset=-c(9))\r\nsummary(model.Kenosha)\r\n<\/script><\/div>\r\n<\/p>\n<h2 style=\"text-align: center;\"><a id=\"displayText2.77t\" href=\"javascript:togglecode('toggleText2.77t','displayText2.77t');\"><i><strong>R Code and Output for Table 2.7<\/strong><\/i><\/a> <\/h2>\n<div id=\"toggleText2.77t\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\nmodel.basiclinearreg &lt;-lm(SALES ~ POP, Lot)\r\nsummary(model.basiclinearreg)\r\nmodel.Kenosha &lt;-lm(SALES ~ POP, Lot, subset=-c(9))\r\nsummary(model.Kenosha)\r\n<\/pre>\n<pre>\r\n<strong>R-Code Output<\/strong>\r\n> model.basiclinearreg &lt;<-lm(SALES ~ POP, Lot)\r\n> summary(model.basiclinearreg)\r\n\r\nCall:\r\nlm(formula = SALES ~ POP, data = Lot)\r\n\r\nResiduals:\r\n   Min     1Q Median     3Q    Max \r\n -6047  -1461   -670    486  18229 \r\n\r\nCoefficients:\r\n            Estimate Std. Error t value Pr(>|t|)    \r\n(Intercept) 469.7036   702.9062    0.67     0.51    \r\nPOP           0.6471     0.0488   13.26   <2e-16 ***\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 3790 on 48 degrees of freedom\r\nMultiple R-squared:  0.785,\tAdjusted R-squared:  0.781 \r\nF-statistic:  176 on 1 and 48 DF,  p-value: <2e-16\r\n\r\n> model.Kenosha<-lm(SALES ~ POP, Lot, subset=-c(9))\r\n> summary(model.Kenosha)\r\n\r\nCall:\r\nlm(formula = SALES ~ POP, data = Lot, subset = -c(9))\r\n\r\nResiduals:\r\n   Min     1Q Median     3Q    Max \r\n -6089  -1001   -193    816   7878 \r\n\r\nCoefficients:\r\n            Estimate Std. Error t value Pr(>|t|)    \r\n(Intercept) -43.4640   511.2931   -0.09     0.93    \r\nPOP           0.6621     0.0352   18.82   <2e-16 ***\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 2730 on 47 degrees of freedom\r\nMultiple R-squared:  0.883,\tAdjusted R-squared:  0.88 \r\nF-statistic:  354 on 1 and 47 DF,  p-value: <2e-16\r\n<\/pre>\n<\/div>\n<figure class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 2.7 Scatter plot of SALES versus POP, with the outlier corresponding to Kenosha marked.\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2PlotWithKenosha.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2PlotWithKenosha.png\" alt=\"F2PlotWithKenosha\" width=\"288\" height=\"288\" class=\"aligncenter size-full wp-image-3262\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2PlotWithKenosha.png 288w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2PlotWithKenosha-150x150.png 150w\" sizes=\"(max-width: 288px) 100vw, 288px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 2.7 Scatter plot of SALES versus POP, with the outlier corresponding to Kenosha marked.<\/figcaption><\/figure>\n<h2 style=\"text-align: center;\"><a id=\"displayText2.7f\" href=\"javascript:togglecode('toggleText2.7f','displayText2.7f');\"><i><strong>R Code for Figure 2.7<\/strong><\/i><\/a> <\/h2>\n<div id=\"toggleText2.7f\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\npar(mar=c(4.1,3.9,2,1),cex=1.1)\r\nplot(POP, SALES, ylab=\"\", las=1)\r\nmtext(\"SALES\", side=2, at=36000,cex=1.1, las=1)\r\ntext(5000, 24000, \"Kenosha\")\r\n<\/pre>\n<\/div>\n<p> For the purposes of inference about the slope, the presence of Kenosha does not alter the results dramatically. Both slope estimates are qualitatively similar and the corresponding \\(t\\)-statistics are very high, well above cut-offs for statistical significance. However, there are dramatic differences when assessing the quality of the fit. The coefficient of determination, \\(R^2\\), increased from 78.5% to 88.3% when deleting Kenosha. Moreover, our \"typical deviation\" \\(s\\) dropped by over $1,000. This is particularly important if we wish to tighten our prediction intervals. <\/p>\n<p> To check the accuracy of our assumptions, it is also customary to check the normality assumption. One way of doing this is the \\(qq\\) plot, introduced in Section 1.2. The two panels in Figures 2.8 are \\(qq\\) plots with and without the Kenosha zip code. Recall that points \"close\" to linear indicate approximate normality. In the right-hand panel of Figure 2.8, the sequence does appear to be linear so that residuals are approximately normally distributed. This is not the case in the left-hand panel, where the sequence of points appears to climb dramatically for large quantiles. The interesting thing is that the non-normality of the distribution is due to a single outlier, not a pattern of skewness that is common to all the observations. <\/p>\n<figure class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"Figure 2.8 \\(qq\\) Plots of Wisconsin Lottery Residuals.     The left-hand panel is based on all 50 points. The right-hand panel is based on 49 points, residuals from a regression after removing Kenosha.\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2QQplotsKenosha.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F2QQplotsKenosha.png\" alt=\"F2QQplotsKenosha\" width=\"576\" height=\"288\" class=\"aligncenter size-full wp-image-3263\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2QQplotsKenosha.png 576w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F2QQplotsKenosha-300x150.png 300w\" sizes=\"(max-width: 576px) 100vw, 576px\" \/><\/a><figcaption class=\"wp-caption-text\">Figure 2.8 \\(qq\\) Plots of Wisconsin Lottery Residuals.     The left-hand panel is based on all 50 points. The right-hand panel is based on 49 points, residuals from a regression after removing Kenosha.<\/figcaption><\/figure>\n<h2 style=\"text-align: center;\"><a id=\"displayText2.8f\" href=\"javascript:togglecode('toggleText2.8f','displayText2.8f');\"><i><strong>R Code for Figure 2.8<\/strong><\/i><\/a> <\/h2>\n<div id=\"toggleText2.8f\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\n\r\npar(mfrow=c(1, 2), mar=c(4.1,3.9,1.7,1),cex=1.1)\r\nqqnorm(residuals(model.basiclinearreg), main=\"\", ylab=\"\", las=1)\r\nmtext(\"Sample Quantiles\", side=2,at=20500,las=1,cex=1.1, adj=.5)\r\nqqnorm(residuals(model.Kenosha), main=\"\", ylab=\"\", las=1)\r\nmtext(\"Sample Quantiles\", side=2,at=9050,las=1,cex=1.1, adj=.5)\r\n\r\n<\/pre>\n<\/div>\n<div class=\"scbb-content-box scbb-content-box-gray\">[WpProQuiz 14]<\/div>\n<p><div class=\"alignleft\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-6-building-a-better-model-residual-analysis\/example-outliers-and-high-leverage-points\/\" title=\"Example: Outliers and High Leverage Points\">&#9668 Previous page<\/a><\/div><div class=\"alignright\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/basic-linear-regression\/2-7-application-capital-asset-pricing-model\/\" title=\"2.7 Application: Capital Asset Pricing Model\">Next page &#9658<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Figure 2.7 exhibits an outlier; the point in the upper left-hand side of the plot represents a zip code that includes Kenosha, Wisconsin. Sales for this zip code are unusually high given its population. Kenosha &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":3378,"menu_order":3,"comment_status":"closed","ping_status":"open","template":"","meta":{"jetpack_post_was_ever_published":false},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P8cLPd-1fk","acf":[],"_links":{"self":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4794"}],"collection":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/comments?post=4794"}],"version-history":[{"count":5,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4794\/revisions"}],"predecessor-version":[{"id":4909,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4794\/revisions\/4909"}],"up":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3378"}],"wp:attachment":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/media?parent=4794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}