{"id":4777,"date":"2015-08-16T08:46:40","date_gmt":"2015-08-16T13:46:40","guid":{"rendered":"http:\/\/www.ssc.wisc.edu\/~jfrees\/?page_id=4777"},"modified":"2023-06-08T15:10:33","modified_gmt":"2023-06-08T20:10:33","slug":"basic-summary-statistics-and-normal-approximation","status":"publish","type":"page","link":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/chapter-1-regression-and-the-normal-distribution\/1-2-fitting-data-to-a-normal-distribution\/basic-summary-statistics-and-normal-approximation\/","title":{"rendered":"Basic Summary Statistics and Normal Approximation"},"content":{"rendered":"<p>For completeness, here are a few definitions. The <em>sample<\/em> is the set of data available for analysis, denoted by \\(y_1,&#8230;,y_n\\). Here, \\(n\\) is the number of observations, \\(y_1\\) represents the first observation, \\(y_2\\) the second, and so on up to \\(y_n\\) for the \\(nth\\) observation. Here are a few important summary statistics. <\/p>\n<div class=\"scbb-content-box scbb-content-box-gray\"><em>Basic Summary Statistics<\/em> \n<ul>\n<li> The <em>mean<\/em> is the average of observations, that is, the sum of the observations divided by the number of units. Using algebraic notation, the mean is \\begin{equation*} \\overline{y}=\\frac{1}{n}\\left( y_1 + \\cdots + y_n \\right) = \\frac{1}{n} \\sum_{i=1}^{n} y_i. \\end{equation*} <\/li>\n<li> The <em>median<\/em> is the middle observation when the observations are ordered by size. That is, it is the observation at which 50% are below it (and 50% are above it). <\/li>\n<li> The <em>standard deviation<\/em> is a measure of the spread, or scale, of the distribution. It is computed as \\begin{equation*} s_y = \\sqrt{\\frac{1}{n-1}\\sum_{i=1}^{n}\\left( y_i-\\overline{y}\\right) ^{2}} . \\end{equation*} <\/li>\n<li> A <em>percentile<\/em> is a number at which a specified fraction of the observations is below it, when the observations are ordered by size. For example, the 25th percentile is that number so that 25% of observations are below it. <\/li><\/ul>\n<p><\/p><\/div>\n<p> To help visualize the distribution, Figure 1.2 displays a <em>histogram<\/em> of the data. Here, the height of the each rectangle shows the relative frequency of observations that fall within the range given by its base. The histogram provides a quick visual impression of the distribution; it shows that the range of the data is approximately (-4,4), the central tendency is slightly greater than zero and that the distribution is roughly symmetric. <\/p>\n<p> <strong>Normal Curve Approximation.<\/strong> Figure 1.2 also shows a normal curve superimposed, using \\(\\overline{y}\\) for \\(\\mu \\) and \\(s_y^{2}\\) for \\(\\sigma ^{2}\\). With the normal curve, only two quantities (\\(\\mu \\) and \\(\\sigma ^{2}\\)) are required to summarize the entire distribution. For example, Table 1.2 shows that 1.168 is the 75th percentile, which is approximately the 204th (= .75 \\(\\times\\) 272) largest observation from the entire sample. From the normal distribution, we have that \\(z=(y-\\mu )\/\\sigma \\) is a standard normal, of which 0.675 is the 75th percentile. Thus, \\(\\overline{ y}+0.675s_y\\)=0.481+0.675\\(\\times\\) 1.101=1.224 is the 75th percentile using the normal curve approximation. <\/p>\n<figure id=\"attachment_269\" class=\"wp-caption aligncenter\" style=\"max-width: 300px;\" aria-label=\"&lt;br \/&gt;\nFigure 1.2. Bodily Injury Relative Frequency with Normal Curve Superimposed.\"><a href=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F1BIHist.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.ssc.wisc.edu\/~jfrees\/wp-content\/uploads\/2015\/04\/F1BIHist.png\" alt=\"F1BIHist\" width=\"432\" height=\"288\" class=\"aligncenter size-full wp-image-3189\" srcset=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F1BIHist.png 432w, https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-content\/uploads\/2015\/04\/F1BIHist-300x200.png 300w\" sizes=\"(max-width: 432px) 100vw, 432px\" \/><\/a><figcaption class=\"wp-caption-text\"><br \/>\nFigure 1.2. Bodily Injury Relative Frequency with Normal Curve Superimposed.<\/figcaption><\/figure>\n<h2 style=\"text-align: center;\"><a id=\"displayText1.2f\" href=\"javascript:togglecode('toggleText1.2f','displayText1.2f');\"><i><strong>R Code for Figure 1.2 <\/strong><\/i><\/a> <\/h2>\n<div id=\"toggleText1.2f\" style=\"display: none\">\n<pre>\r\n<strong>R-Code<\/strong>\r\n# FIGURE 1.2 \r\ninjury &lt;- read.csv('http:\/\/instruction.bus.wisc.edu\/jfrees\/jfreesbooks\/Regression%20Modeling\/BookWebDec2010\/CSVData\/MassBodilyInjury.csv',header=TRUE)\r\ninjury2 = subset(injury, providerA != 0 )\r\nLOGCLAIMS &lt;- injury2$log(claims)\r\npar(mar=c(4.2,3,.1,.1),cex=1.3)\r\nx &lt;- seq(-4, 4, 0.01)\r\ny &lt;- dnorm(x, mean=mean(LOGCLAIMS), sd=sqrt(var(LOGCLAIMS)))\r\nhist(LOGCLAIMS, freq=FALSE, main=\"\", ylab=\"\", las=1)\r\nmtext(\"Density\", side=2, at=.35,las=1, adj=.7,cex=1.4)\r\nlines(x,y, col='blue')\r\n<\/pre>\n<\/div>\n<p><div class=\"alignleft\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/chapter-1-regression-and-the-normal-distribution\/1-2-fitting-data-to-a-normal-distribution\/example-massachusetts-bodily-injury-claims\/\" title=\"Example: Massachusetts Bodily Injury Claims\">&#9668 Previous page<\/a><\/div><div class=\"alignright\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/regression\/chapter-1-regression-and-the-normal-distribution\/1-2-fitting-data-to-a-normal-distribution\/box-and-q-q-plots\/\" title=\"Box and qq Plots\">Next page &#9658<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>For completeness, here are a few definitions. The sample is the set of data available for analysis, denoted by $(y_1,ldots,y_n$). Here, ($n$) is the number of observations, ($y_1$) represents the first observation, $(y_2$) the second, and so on up to ($y_n$) for the ($n$th) observation. Here are a few important summary statistics. Basic Summary Statistics&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":3183,"menu_order":2,"comment_status":"closed","ping_status":"open","template":"","meta":{"jetpack_post_was_ever_published":false},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P8cLPd-1f3","acf":[],"_links":{"self":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4777"}],"collection":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/comments?post=4777"}],"version-history":[{"count":8,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4777\/revisions"}],"predecessor-version":[{"id":6510,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/4777\/revisions\/6510"}],"up":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/3183"}],"wp:attachment":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/media?parent=4777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}