{"id":5958,"date":"2016-12-28T10:40:43","date_gmt":"2016-12-28T16:40:43","guid":{"rendered":"http:\/\/www.ssc.wisc.edu\/~jfrees\/?page_id=5958"},"modified":"2017-01-06T08:08:43","modified_gmt":"2017-01-06T14:08:43","slug":"3-5-maximum-likelihood-estimation","status":"publish","type":"page","link":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/loss-data-analytics\/chapter-3-modeling-loss-severity\/3-5-maximum-likelihood-estimation\/","title":{"rendered":"3.5 Maximum Likelihood Estimation"},"content":{"rendered":"<div class=\"scbb-content-box scbb-content-box-gray\">In this section we estimate statistical parameters using the method of maximum likelihood. Maximum likelihood estimates in the presence of grouping, truncation or censoring are calculated.<\/div>\n<h1>3.5.1 Maximum Likelihood Estimators for Complete Data<\/h1>\n<p>Pricing of insurance premiums and estimation of claim reserving are among many actuarial problems that involve modeling the severity of loss (claim size). The principles for using maximum likelihood to estimate model parameters were introduced in Chapter 2. In this\u00a0section,\u00a0we\u00a0present a few examples to illustrate how actuaries fit a parametric distribution model to a set of claim data using maximum likelihood. In these examples we derive the asymptotic variance\u00a0of\u00a0maximum-likelihood estimators of the model parameters. We  use the delta method to derive the asymptotic variances of functions of these parameters.<\/p>\n<p><strong>Example 3.21<\/strong> Consider a random sample of claim amounts: 8,000 10,000 12,000 15,000. You assume that claim amounts follow an inverse exponential distribution, with parameter $\\theta$.<\/p>\n<ol>\n<li>Calculate the maximum likelihood estimator for $\\theta$.<\/li>\n<li>Approximate the variance of the maximum likelihood estimator.<\/li>\n<li> Determine an approximate 95% confidence interval for $\\theta$.<\/li>\n<li>Determine an approximate 95% confidence interval for $\\Pr \\left( X \\leq 9,000 \\right).$<\/li>\n<\/ol>\n<p><a id=\"displayText321\" href=\"javascript:toggle('toggleText321','displayText321');\"><i>Solution<\/i><\/a> <\/p>\n<div id=\"toggleText321\" style=\"display: none\">\n<hr \/>\n<p>The probability density function is<br \/>\n$$f_{X}\\left( x \\right) = \\frac{\\theta e^{- \\frac{\\theta}{x}}}{x^{2}}, $$<br \/>\nwhere $x > 0$. The likelihood function, $L\\left( \\theta \\right)$, can be viewed as the probability of the observed data, written as a function of the model\u2019s<br \/>\nparameter $\\theta$<br \/>\n$$L\\left( \\theta \\right) = \\prod_{i = 1}^{4}{f_{X_{i}}\\left( x_{i} \\right)} = \\frac{\\theta^{4}e^{- \\theta\\sum_{i = 1}^{4}\\frac{1}{x_{i}}}}{\\prod_{i = 1}^{4}x_{i}^{2}}.$$<\/p>\n<p>The loglikelihood function, $\\ln L \\left( \\theta \\right)$, is the sum of the individual logarithms.<br \/>\n$$\\ln L \\left( \\theta \\right) = 4ln\\theta &#8211; \\theta\\sum_{i = 1}^{4}\\frac{1}{x_{i}} &#8211; 2\\sum_{i = 1}^{4}\\ln x_{i} .$$<\/p>\n<p>$$\\frac{\\ln L \\left( \\theta \\right)}{\\text{d\u03b8}} = \\frac{4}{\\theta} &#8211; \\sum_{i = 1}^{4}\\frac{1}{x_{i}}.$$<br \/>\nThe maximum likelihood estimator of $\\theta$, denoted by $\\hat{\\theta}$, is the solution to the equation<br \/>\n$$\\frac{4}{\\hat{\\theta}} &#8211; \\sum_{i = 1}^{4}{\\frac{1}{x_{i}} = 0}.$$ Thus,<br \/>\n$\\hat{\\theta} = \\frac{4}{\\sum_{i = 1}^{4}\\frac{1}{x_{i}}} = 10,667$<\/p>\n<p>The second derivative of $\\ln L \\left( \\theta \\right)$ is given by<br \/>\n$$\\frac{d^{2}\\ln L\\left( \\theta \\right)}{d\\theta^{2}} = \\frac{- 4}{\\theta^{2}}.$$<br \/>\nEvaluating the second derivative of the loglikelihood function at $\\hat{\\theta} = 10,667$ gives a negative value, indicating $\\hat{\\theta}$ as the value that maximizes the loglikelihood function.<\/p>\n<p>Taking reciprocal of negative expectation of the second derivative of $\\ln L \\left( \\theta \\right)$, we obtain an estimate of the variance of $\\hat{\\theta}$<br \/>\n$\\widehat{Var}\\left( \\hat{\\theta} \\right) = \\left. \\ \\left\\lbrack E\\left( \\frac{d^{2}\\ln L \\left( \\theta \\right)}{d\\theta^{2}} \\right) \\right\\rbrack^{- 1} \\right|_{\\theta = \\hat{\\theta}} = \\frac{{\\hat{\\theta}}^{2}}{4} = 28,446,222$.<\/p>\n<p>It should be noted that as the sample size $n \\rightarrow \\infty$, the distribution of the maximum likelihood estimator $\\hat{\\theta}$ converges to a normal distribution with mean $\\theta$ and variance $\\hat{V}\\left( \\hat{\\theta} \\right)$. The approximate confidence interval in this example is based on the assumption of normality, despite the small sample size, only for the purpose of illustration.<\/p>\n<p>The 95% confidence interval for $\\theta$ is given by<br \/>\n$$10,667 \\pm 1.96\\sqrt{28,446,222} = \\left( 213.34,\\ 21,120.66 \\right).$$<br \/>\nThe distribution function of $X$ is $F\\left( x \\right) = 1 &#8211; e^{- \\frac{x}{\\theta}}$. Then, the maximum likelihood estimate of $g\\left( \\theta \\right) = F\\left( 9,000 \\right)$ is<br \/>\n$$g\\left( \\hat{\\theta} \\right) = 1 &#8211; e^{- \\frac{9,000}{10,667}} = 0.57.$$<br \/>\nWe use the delta method to approximate the variance of $g\\left( \\hat{\\theta} \\right)$.<br \/>\n$$\\frac{\\text{dg}\\left( \\theta \\right)}{\\text{d\u03b8}} = {- \\frac{9,000}{\\theta^{2}}e}^{- \\frac{9,000}{\\theta}}.$$<\/p>\n<p>$\\widehat{Var}\\left\\lbrack g\\left( \\hat{\\theta} \\right) \\right\\rbrack = \\left( &#8211; {\\frac{9,000}{{\\hat{\\theta}}^{2}}e}^{- \\frac{9,000}{\\hat{\\theta}}} \\right)^{2}\\hat{V}\\left( \\hat{\\theta} \\right) = 0.0329$.<\/p>\n<p>The 95% confidence interval for $F\\left( 9,000 \\right)$ is given by<br \/>\n$$0.57 \\pm 1.96\\sqrt{0.0329} = \\left( 0.214,\\ 0.926 \\right).$$<\/p>\n<hr \/>\n<\/div>\n<p><strong>Example 3.22<\/strong> A random sample of size 6 is from a lognormal distribution with parameters $\\mu$ and $\\sigma$. The sample values are 200, 3,000, 8,000, 60,000, 60,000, 160,000.<\/p>\n<ol>\n<li> Calculate the maximum likelihood estimator for $\\mu$ and $\\sigma$.<\/li>\n<li>Estimate the covariance matrix of the maximum likelihood estimator.<\/li>\n<li>Determine approximate 95% confidence intervals for $\\mu$ and $\\sigma$.<\/li>\n<li>Determine an approximate 95% confidence interval for the mean of the lognormal distribution.<\/li>\n<\/ol>\n<p><a id=\"displayText322\" href=\"javascript:toggle('toggleText322','displayText322');\"><i>Solution<\/i><\/a> <\/p>\n<div id=\"toggleText322\" style=\"display: none\">\n<hr \/>\n<p>The probability density function is<br \/>\n$$f_{X}\\left( x \\right) = \\frac{1}{\\text{x\u03c3}\\sqrt{2\\pi}}exp &#8211; \\frac{1}{2}\\left( \\frac{lnx &#8211; \\mu}{\\sigma} \\right)^{2},$$<br \/>\nwhere $x > 0$. The likelihood function, $L\\left( \\mu,\\sigma \\right)$, is the product of the pdf for each data point.<br \/>\n$$L\\left( \\mu,\\sigma \\right) = \\prod_{i = 1}^{6}{f_{X_{i}}\\left( x_{i} \\right)} = \\frac{1}{\\sigma^{6}\\left( 2\\pi \\right)^{3}\\prod_{i = 1}^{6}x_{i}}exp &#8211; \\frac{1}{2}\\sum_{i = 1}^{6}\\left( \\frac{\\ln x_{i} &#8211; \\mu}{\\sigma} \\right)^{2}.$$<br \/>\nThe loglikelihood function, $\\ln L \\left( \\mu,\\sigma \\right)$, is the sum of the individual logarithms.<br \/>\n$$\\ln \\left( \\mu,\\sigma \\right) = &#8211; 6ln\\sigma &#8211; 3ln\\left( 2\\pi \\right) &#8211; \\sum_{i = 1}^{6}\\ln x_{i} &#8211; \\frac{1}{2}\\sum_{i = 1}^{6}\\left( \\frac{\\ln x_{i} &#8211; \\mu}{\\sigma} \\right)^{2}.$$<br \/>\nThe first partial derivatives are<br \/>\n$$\\frac{\\partial lnL\\left( \\mu,\\sigma \\right)}{\\partial\\mu} = \\frac{1}{\\sigma^{2}}\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\mu \\right).$$<br \/>\n$$\\frac{\\partial lnL\\left( \\mu,\\sigma \\right)}{\\partial\\sigma} = \\frac{- 6}{\\sigma} + \\frac{1}{\\sigma^{3}}\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\mu \\right)^{2}.$$<br \/>\nThe maximum likelihood estimators of $\\mu$ and $\\sigma$, denoted by $\\hat{\\mu}$ and $\\hat{\\sigma}$, are the solutions to the equations<br \/>\n$$\\frac{1}{{\\hat{\\sigma}}^{2}}\\sum_{i = 1}^{6}\\left( lnx_{i} &#8211; \\hat{\\mu} \\right) = 0.$$<br \/>\n$$\\frac{- 6}{\\hat{\\sigma}} + \\frac{1}{{\\hat{\\sigma}}^{3}}\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\hat{\\mu} \\right)^{2} = 0.$$<br \/>\nThese yield the estimates<\/p>\n<p>$\\hat{\\mu} = \\frac{\\sum_{i = 1}^{6}{\\ln x_{i}}}{6} = 9.38$ and<br \/>\n${\\hat{\\sigma}}^{2} = \\frac{\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\hat{\\mu} \\right)^{2}}{6} = 5.12$.<\/p>\n<p>The second partial derivatives are<\/p>\n<p>$\\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\mu^{2}} = \\frac{- 6}{\\sigma^{2}}$,<br \/>\n$\\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\mu\\partial\\sigma} = \\frac{- 2}{\\sigma^{3}}\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\mu \\right)$<br \/>\nand<br \/>\n$\\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\sigma^{2}} = \\frac{6}{\\sigma^{2}} &#8211; \\frac{3}{\\sigma^{4}}\\sum_{i = 1}^{6}\\left( \\ln x_{i} &#8211; \\mu \\right)^{2}$.<\/p>\n<p>To derive the covariance matrix of the mle we need to find the expectations of the second derivatives. Since the random variable $X$ is from a lognormal distribution with parameters $\\mu$ and $\\sigma$, then $\\text{lnX}$ is normally distributed with mean $\\mu$ and variance $\\sigma^{2}$.<\/p>\n<p>$E\\left( \\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\mu^{2}} \\right) = E\\left( \\frac{- 6}{\\sigma^{2}} \\right) = \\frac{- 6}{\\sigma^{2}}$,<\/p>\n<p>$E\\left( \\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\mu\\partial\\sigma} \\right) = \\frac{- 2}{\\sigma^{3}}\\sum_{i = 1}^{6}{E\\left( \\ln x_{i} &#8211; \\mu \\right)} = \\frac{- 2}{\\sigma^{3}}\\sum_{i = 1}^{6}\\left\\lbrack E\\left( \\ln x_{i} \\right) &#8211; \\mu \\right\\rbrack$=$\\frac{- 2}{\\sigma^{3}}\\sum_{i = 1}^{6}\\left( \\mu &#8211; \\mu \\right) = 0$,<\/p>\n<p>and<\/p>\n<p>$E\\left( \\frac{\\partial^{2}\\text{lnL}\\left( \\mu,\\sigma \\right)}{\\partial\\sigma^{2}} \\right) = \\frac{6}{\\sigma^{2}} &#8211; \\frac{3}{\\sigma^{4}}\\sum_{i = 1}^{6}{E\\left( \\ln x_{i} &#8211; \\mu \\right)}^{2} = \\frac{6}{\\sigma^{2}} &#8211; \\frac{3}{\\sigma^{4}}\\sum_{i = 1}^{6}{V\\left( \\ln x_{i} \\right) = \\frac{6}{\\sigma^{2}} &#8211; \\frac{3}{\\sigma^{4}}\\sum_{i = 1}^{6}{\\sigma^{2} = \\frac{- 12}{\\sigma^{2}}}}$.<\/p>\n<p>Using the negatives of these expectations we obtain the Fisher information matrix $\\begin{bmatrix}<br \/>\n\\frac{6}{\\sigma^{2}} &#038; 0 \\\\<br \/>\n0 &#038; \\frac{12}{\\sigma^{2}} \\\\<br \/>\n\\end{bmatrix}$.<\/p>\n<p>The covariance matrix, $\\Sigma$, is the inverse of the Fisher information matrix $\\Sigma = \\begin{bmatrix}<br \/>\n\\frac{\\sigma^{2}}{6} &#038; 0 \\\\<br \/>\n0 &#038; \\frac{\\sigma^{2}}{12} \\\\<br \/>\n\\end{bmatrix}$.<\/p>\n<p>The estimated matrix is given by $\\hat{\\Sigma} = \\begin{bmatrix}<br \/>\n0.8533 &#038; 0 \\\\<br \/>\n0 &#038; 0.4267 \\\\<br \/>\n\\end{bmatrix}$.<\/p>\n<p>The 95% confidence interval for $\\mu$ is given by $9.38 \\pm 1.96\\sqrt{0.8533} = \\left( 7.57,\\ 11.19 \\right)$.<\/p>\n<p>The 95% confidence interval for $\\sigma^{2}$ is given by $5.12 \\pm 1.96\\sqrt{0.4267} = \\left( 3.84,\\ 6.40 \\right)$.<\/p>\n<p>The mean of *X* is $\\exp\\left( \\mu + \\frac{\\sigma^{2}}{2} \\right)$. Then, the maximum likelihood estimate of<br \/>\n$$g\\left( \\mu,\\sigma \\right) = \\exp\\left( \\mu + \\frac{\\sigma^{2}}{2} \\right)$$<br \/>\nis<br \/>\n$$g\\left( \\hat{\\mu},\\hat{\\sigma} \\right) = \\exp\\left( \\hat{\\mu} + \\frac{{\\hat{\\sigma}}^{2}}{2} \\right) = 153,277.$$<\/p>\n<p>We use the delta method to approximate the variance of the mle<br \/>\n$g\\left( \\hat{\\mu},\\hat{\\sigma} \\right)$.<\/p>\n<p>$\\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\mu} = exp\\left( \\mu + \\frac{\\sigma^{2}}{2} \\right)$<br \/>\nand<br \/>\n$\\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\sigma} = \\sigma exp\\left( \\mu + \\frac{\\sigma^{2}}{2} \\right)$.<\/p>\n<p>Using the delta method, the approximate variance of<br \/>\n$g\\left( \\hat{\\mu},\\hat{\\sigma} \\right)$ is given by<\/p>\n<p>$$\\left. \\ \\hat{V}\\left( g\\left( \\hat{\\mu},\\hat{\\sigma} \\right) \\right) = \\begin{bmatrix}<br \/>\n\\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\mu} &#038; \\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\sigma} \\\\<br \/>\n\\end{bmatrix}\\Sigma\\begin{bmatrix}<br \/>\n\\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\mu} \\\\<br \/>\n\\frac{\\partial g\\left( \\mu,\\sigma \\right)}{\\partial\\sigma} \\\\<br \/>\n\\end{bmatrix} \\right|_{\\mu = \\hat{\\mu},\\sigma = \\hat{\\sigma}}$$<\/p>\n<p>$= \\begin{bmatrix}<br \/>\n153,277 &#038; 346,826 \\\\<br \/>\n\\end{bmatrix}\\begin{bmatrix}<br \/>\n0.8533 &#038; 0 \\\\<br \/>\n0 &#038; 0.4267 \\\\<br \/>\n\\end{bmatrix}\\begin{bmatrix}<br \/>\n153,277 \\\\<br \/>\n346,826 \\\\<br \/>\n\\end{bmatrix} =$71,374,380,000<\/p>\n<p>The 95% confidence interval for $\\exp\\left( \\mu + \\frac{\\sigma^{2}}{2} \\right)$ is given by<\/p>\n<p>$153,277 \\pm 1.96\\sqrt{71,374,380,000} = \\left( &#8211; 370,356,\\ 676,910 \\right)$.<\/p>\n<p>Since the mean of the lognormal distribution cannot be negative, we should replace the negative lower limit in the previous interval by a zero.<\/p>\n<hr \/>\n<\/div>\n<h1>3.5.2 Maximum Likelihood Estimators for Grouped Data<\/h1>\n<p>In the previous section we considered the maximum likelihood estimation of continuous models from complete (individual) data. Each individual observation is recorded, and its contribution to the likelihood function is the density at that value. In this section we consider the problem of obtaining maximum likelihood estimates of parameters from grouped data. The observations are only available in grouped form, and the contribution of each observation to the likelihood function is the probability of falling in a specific group (interval). Let $n_{j}$ represent the number of observations in the interval $\\left( \\left. \\ c_{j &#8211; 1},c_{j} \\right\\rbrack \\right.\\ $ The grouped data likelihood function is thus given by<br \/>\n$$L\\left( \\theta \\right) = \\prod_{j = 1}^{k}\\left\\lbrack F\\left( \\left. \\ c_{j} \\right|\\theta \\right) &#8211; F\\left( \\left. \\ c_{j &#8211; 1} \\right|\\theta \\right) \\right\\rbrack^{n_{j}},$$<br \/>\nwhere $c_{0}$ is the smallest possible observation (often set to zero) and $c_{k}$ is the largest possible observation (often set to infinity).<\/p>\n<p><strong>Example 3.23 (SOA)<\/strong> For a group of policies, you are given that losses follow the distribution function $F\\left( x \\right) = 1 &#8211; \\frac{\\theta}{x}$, for $\\theta \\lt x \\lt \\infty.$ Further, a sample of 20 losses resulted in the following:<br \/>\n$$<br \/>\n{\\small \\begin{matrix}\\hline<br \/>\n\\text{Interval} &#038; \\text{Number of Losses} \\\\ \\hline<br \/>\nx \\leq 10 &#038; 9 \\\\<br \/>\n10 \\lt x \\leq 25&#038; 6 \\\\<br \/>\nx \\gt 25&#038; 5 \\\\\\hline<br \/>\n\\end{matrix}}$$ Calculate the maximum likelihood estimate of $\\theta$.<\/p>\n<p><a id=\"displayText323\" href=\"javascript:toggle('toggleText323','displayText323');\"><i>Solution<\/i><\/a> <\/p>\n<div id=\"toggleText323\" style=\"display: none\">\n<hr \/>\n<p>The contribution of each of the 9 observations in the first interval to the likelihood function is the probability of $X \\leq 10$; that is, $\\Pr\\left( X \\leq 10 \\right) = F\\left( 10 \\right)$. Similarly, the contributions of each of 6 and 5 observations in the second and third intervals are $\\Pr\\left( 10 \\lt X \\leq 25 \\right) = F\\left( 25 \\right) &#8211; F(10)$ and $P\\left( X \\gt 25 \\right) = 1 &#8211; F(25)$, respectively. The likelihood function is thus given by<br \/>\n$$L\\left( \\theta \\right) = \\left\\lbrack F\\left( 10 \\right) \\right\\rbrack^{9}\\left\\lbrack F\\left( 25 \\right) &#8211; F(10) \\right\\rbrack^{6}\\left\\lbrack 1 &#8211; F(25) \\right\\rbrack^{5}$$<br \/>\n$${= \\left( 1 &#8211; \\frac{\\theta}{10} \\right)}^{9}\\left( \\frac{\\theta}{10} &#8211; \\frac{\\theta}{25} \\right)^{6}\\left( \\frac{\\theta}{25} \\right)^{5}$$<br \/>\n$${= \\left( \\frac{10 &#8211; \\theta}{10} \\right)}^{9}\\left( \\frac{15\\theta}{250} \\right)^{6}\\left( \\frac{\\theta}{25} \\right)^{5}.$$<br \/>\nThen,<br \/>\n$\\ln L \\left( \\theta \\right) = 9ln\\left( 10 &#8211; \\theta \\right) + 6ln\\theta + 5ln\\theta &#8211; 9ln10 + 6ln15 &#8211; 6ln250 &#8211; 5ln25$.<br \/>\n$$\\frac{\\ln L \\left( \\theta \\right)}{\\text{d\u03b8}} = \\frac{- 9}{\\left( 10 &#8211; \\theta \\right)} + \\frac{6}{\\theta} + \\frac{5}{\\theta}.$$<br \/>\nThe maximum likelihood estimator, $\\hat{\\theta}$, is the solution to the equation<br \/>\n$$\\frac{- 9}{\\left( 10 &#8211; \\hat{\\theta} \\right)} + \\frac{11}{\\hat{\\theta}} = 0$$<br \/>\nand $\\hat{\\theta} = 5.5$.<\/p>\n<hr \/>\n<\/div>\n<h1>3.5.3 Maximum Likelihood Estimators for Censored Data<\/h1>\n<p>Another distinguishing feature of data gathering mechanism is censoring. While for some event of interest (losses, claims, lifetimes, etc.) the complete data maybe available, for others only partial information is available; information that the observation exceeds a specific value. The limited policy introduced in Section 3.4.2 is an example of right censoring. Any loss greater than or equal to the policy limit is recorded at the limit. The contribution of the censored observation to the likelihood function is the probability of the random variable exceeding this specific limit. Note that contributions of both complete and censored data share the survivor function, for a complete point this survivor function is multiplied by the hazard function, but for a censored observation it is not.<\/p>\n<p><strong>Example 3.24 (SOA)<\/strong> The random variable has survival function:<br \/>\n$$S_{X}\\left( x \\right) = \\frac{\\theta^{4}}{\\left( \\theta^{2} + x^{2} \\right)^{2}}.$$<br \/>\nTwo values of $X$ are observed to be 2 and 4. One other value exceeds 4.<br \/>\nCalculate the maximum likelihood estimate of $\\theta$.<br \/>\n<a id=\"displayText324\" href=\"javascript:toggle('toggleText324','displayText324');\"><i>Solution<\/i><\/a> <\/p>\n<div id=\"toggleText324\" style=\"display: none\">\n<hr \/>\n<p>The contributions of the two observations 2 and 4 are $f_{X}\\left( 2 \\right)$ and $f_{X}\\left( 4 \\right)$ respectively. The contribution of the third observation, which is only known to exceed 4 is $S_{X}\\left( 4 \\right)$. The likelihood function is thus given by<br \/>\n$$L\\left( \\theta \\right) = f_{X}\\left( 2 \\right)f_{X}\\left( 4 \\right)S_{X}\\left( 4 \\right).$$<br \/>\nThe probability density function of $X$ is given by<br \/>\n$$f_{X}\\left( x \\right) = \\frac{4x\\theta^{4}}{\\left( \\theta^{2} + x^{2} \\right)^{3}}.$$<br \/>\nThus,<br \/>\n$$L\\left( \\theta \\right) = \\frac{8\\theta^{4}}{\\left( \\theta^{2} + 4 \\right)^{3}}\\frac{16\\theta^{4}}{\\left( \\theta^{2} + 16 \\right)^{3}}\\frac{\\theta^{4}}{\\left( \\theta^{2} + 16 \\right)^{2}} = \\\\<br \/>\n\\frac{128\\theta^{12}}{\\left( \\theta^{2} + 4 \\right)^{3}\\left( \\theta^{2} + 16 \\right)^{5}},$$<\/p>\n<p>$\\ln L\\left( \\theta \\right) = ln128 + 12ln\\theta &#8211; 3ln\\left( \\theta^{2} + 4 \\right) &#8211; 5ln\\left( \\theta^{2} + 16 \\right)$,<\/p>\n<p>and<\/p>\n<p>$\\frac{\\text{dlnL}\\left( \\theta \\right)}{\\text{d\u03b8}} = \\frac{12}{\\theta} &#8211; \\frac{6\\theta}{\\left( \\theta^{2} + 4 \\right)} &#8211; \\frac{10\\theta}{\\left( \\theta^{2} + 16 \\right)}$.<\/p>\n<p>The maximum likelihood estimator, $\\hat{\\theta}$, is the solution to the equation<br \/>\n$$\\frac{12}{\\hat{\\theta}} &#8211; \\frac{6\\hat{\\theta}}{\\left( {\\hat{\\theta}}^{2} + 4 \\right)} &#8211; \\frac{10\\hat{\\theta}}{\\left( {\\hat{\\theta}}^{2} + 16 \\right)} = 0$$<br \/>\nor<br \/>\n$$12\\left( {\\hat{\\theta}}^{2} + 4 \\right)\\left( {\\hat{\\theta}}^{2} + 16 \\right) &#8211; 6{\\hat{\\theta}}^{2}\\left( {\\hat{\\theta}}^{2} + 16 \\right) &#8211; 10{\\hat{\\theta}}^{2}\\left( {\\hat{\\theta}}^{2} + 4 \\right) = \\\\<br \/>\n&#8211; 4{\\hat{\\theta}}^{4} + 104{\\hat{\\theta}}^{2} + 768 = 0,$$<br \/>\nwhich yields ${\\hat{\\theta}}^{2} = 32$ and $\\hat{\\theta} = 5.7$.<\/p>\n<hr \/>\n<\/div>\n<h1>3.5.4 Maximum Likelihood Estimators for Truncated Data<\/h1>\n<p>This section is concerned with the maximum likelihood estimation of the continuous distribution of the random variable $X$ when the data is incomplete due to truncation. If the values of $X$ are truncated at $d$, then it should be noted that we would not have been aware of the existence of these values had they not exceeded $d$. The policy deductible introduced in Section 3.4.1 is an example of left truncation. Any loss less than or equal to the deductible is not recorded. The contribution to the likelihood function of an observation $x$ truncated at $d$ will be a conditional probability and the $f_{X}\\left( x \\right)$ will be replaced by $\\frac{f_{X}\\left( x \\right)}{S_{X}\\left( d \\right)}$.<\/p>\n<p><strong>Example 3.25 (SOA)<\/strong> For the single parameter Pareto distribution with $\\theta = 2$, maximum likelihood estimation is applied to estimate the parameter $\\alpha$. Find the estimated mean of the ground up loss distribution based on the maximum likelihood estimate of $\\alpha$ for the following data set:<\/p>\n<ul>\n<li>Ordinary policy deductible of 5, maximum covered loss of 25 (policy limit 20)<\/li>\n<li>8 insurance payment amounts: 2, 4, 5, 5, 8, 10, 12, 15<\/li>\n<li>2 limit payments: 20, 20.<\/li>\n<\/ul>\n<p><a id=\"displayText325\" href=\"javascript:toggle('toggleText325','displayText325');\"><i>Solution<\/i><\/a> <\/p>\n<div id=\"toggleText325\" style=\"display: none\">\n<hr \/>\n<p>The contributions of the different observations can be summarized as follows:<\/p>\n<ul>\n<li>For the exact loss: $f_{X}\\left( x \\right)$<\/li>\n<li>For censored observations: $S_{X}\\left( 25 \\right)$.<\/li>\n<li>For truncated observations: $\\frac{f_{X}\\left( x \\right)}{S_{X}\\left( 5 \\right)}$.<\/li>\n<\/ul>\n<p>Given that ground up losses smaller than 5 are omitted from the data set, the contribution of all observations should be conditional on exceeding 5. The likelihood function becomes<br \/>\n$$L\\left( \\alpha \\right) = \\frac{\\prod_{i = 1}^{8}{f_{X}\\left( x_{i} \\right)}}{\\left\\lbrack S_{X}\\left( 5 \\right) \\right\\rbrack^{8}}\\left\\lbrack \\frac{S_{X}\\left( 25 \\right)}{S_{X}\\left( 5 \\right)} \\right\\rbrack^{2}.$$<br \/>\nFor the single parameter Pareto the probability density and distribution functions are given by<\/p>\n<p>$$f_{X}\\left( x \\right) = \\frac{\\alpha\\theta^{\\alpha}}{x^{\\alpha + 1}} \\ \\ \\text{and} \\ \\ F_{X}\\left( x \\right) = 1 &#8211; \\left( \\frac{\\theta}{x} \\right)^{\\alpha},$$<br \/>\nfor $x > \\theta$, respectively. Then, the likelihood and loglikelihood functions are given by<br \/>\n$$L\\left( \\alpha \\right) = \\frac{\\alpha^{8}}{\\prod_{i = 1}^{8}x_{i}^{\\alpha + 1}}\\frac{5^{10\\alpha}}{25^{2\\alpha}},$$<br \/>\n$$\\ln L \\left( \\alpha \\right) = 8ln\\alpha &#8211; \\left( \\alpha + 1 \\right)\\sum_{i = 1}^{8}{\\ln x_{i}} + 10\\alpha ln5 &#8211; 2\\alpha ln25.$$<\/p>\n<p>$\\frac{\\text{dlnL}\\left( \\alpha \\right)}{\\text{d\u03b1}} = \\frac{8}{\\alpha} &#8211; \\sum_{i = 1}^{8}{\\ln x_{i}} + 10ln5 &#8211; 2ln25$.<\/p>\n<p>The maximum likelihood estimator, $\\hat{\\alpha}$, is the solution to the equation<br \/>\n$$\\frac{8}{\\hat{\\alpha}} &#8211; \\sum_{i = 1}^{8}{\\ln x_{i}} + 10ln5 &#8211; 2ln25 = 0,$$which yields<br \/>\n$$\\hat{\\alpha} = \\frac{8}{\\sum_{i = 1}^{8}{\\ln x_{i}} &#8211; 10ln5 + 2ln25} = \\frac{8}{(ln7 + ln9 + \\ldots + ln20) &#8211; 10ln5 + 2ln25} = 0.785.$$<br \/>\nThe mean of the Pareto only exists for $\\alpha > 1$. Since $\\hat{\\alpha} = 0.785 \\lt 1$. Then, the mean does not exist.<\/p>\n<hr \/>\n<\/div>\n<h1>3.6 Concluding Remarks<\/h1>\n<p>In describing losses, actuaries fit appropriate parametric distribution models for the frequency and severity of loss. This involves finding appropriate statistical distributions that could efficiently model the data in hand. After fitting a distribution model to a data set, the model should be validated. Model validation is a crucial step in the model building sequence. It assesses how well these statistical distributions fit the data in hand and how well can we expect this model to perform in the future. If the selected\u00a0model\u00a0does not\u00a0fit\u00a0the data, another distribution\u00a0is to be chosen. If more than one model seems to be a good fit for the data, we then have to make the choice on which model to use. It should be noted though that the same data should not serve for both purposes (fitting and validating the model). Additional data should be used to assess the performance of the model. There are many statistical tools for model validation. Alternative goodness of fit tests used to determine whether sample data are consistent with the candidate model, will be presented in a separate chapter.<\/p>\n<h1>Further Readings and References<\/h1>\n<ul>\n<li>Cummins, J. D. and Derrig, R. A. 1991. Managing the Insolvency Risk of Insurance Companies, Springer Science+ Business Media, LLC.<\/li>\n<li>Frees, E. W. and Valdez, E. A. 2008. Hierarchical insurance claims modeling, <em>Journal of the American Statistical Association<\/em>, 103, 1457-1469.<\/li>\n<li>Klugman, S. A., Panjer, H. H. and Willmot, G. E. 2008. <em>Loss Models from Data to Decisions<\/em>, Wiley.<\/li>\n<li>Kreer, M., K\u0131z\u0131lers\u00fc, A., Thomas, A. W. and Eg\u00eddio dos Reis, A. D. 2015. Goodness-of-fit tests and applications for left-truncated Weibull distributions to non-life insurance, <em>European Actuarial Journal<\/em>, 5, 139\u2013163.<\/li>\n<li>McDonald, J. B. 1984. Some generalized functions for the size distribution of income, <em>Econometrica<\/em> 52, 647\u2013663.<\/li>\n<li>McDonald, J. B. and Xu, Y. J. 1995. A generalization of the beta distribution with applications, <em>Journal of Econometrics<\/em> 66, 133\u201352.<\/li>\n<li>Tevet, D. 2016. Applying generalized linear models to insurance data: Frequency\/severity versus premium modeling in: Frees, E. W., Derrig, A.<br \/>\nR. and Meyers G. (Eds.) <em>Predictive Modeling Applications in Actuarial Science<\/em> Vol. II Case Studies in Insurance. Cambridge University Press.<\/li>\n<li>Venter, G. 1983. Transformed beta and gamma distributions and aggregate losses. <em>Proceedings of the Casualty Actuarial Society<\/em> 70: 156\u2013193.<\/li>\n<\/ul>\n<p><div class=\"alignleft\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/loss-data-analytics\/chapter-3-modeling-loss-severity\/coverage-modifications\/\" title=\"3.4 Coverage Modifications\">&#9668 Previous page<\/a><\/div><div class=\"alignright\"><a href=\"https:\/\/users.ssc.wisc.edu\/~ewfrees\/loss-data-analytics\/chapter-3-modeling-loss-severity\/loss-data-analytics-severity-problems\/\" title=\"Severity Guided Tutorials\">Next page &#9658<\/a><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>3.5.1 Maximum Likelihood Estimators for Complete Data Pricing of insurance premiums and estimation of claim reserving are among many actuarial problems that involve modeling the severity of loss (claim size). The principles for using maximum &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":5910,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P8cLPd-1y6","acf":[],"_links":{"self":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/5958"}],"collection":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/comments?post=5958"}],"version-history":[{"count":15,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/5958\/revisions"}],"predecessor-version":[{"id":6146,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/5958\/revisions\/6146"}],"up":[{"embeddable":true,"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/pages\/5910"}],"wp:attachment":[{"href":"https:\/\/users.ssc.wisc.edu\/~ewfrees\/wp-json\/wp\/v2\/media?parent=5958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}