Chapter 10 Sensitivity and Data Uncertainty

Chapter Preview. In Chapter 9, we explored how optimal values of retained risk measures, our objective functions, change as assumptions vary. Although this understanding is helpful, it is the retention parameters, and hence decision variables, that are contractually documented and likely to be scrutinized carefully. This raises the question: to what extent will the optimal values of the decision variables change as assumptions vary?

Perturbation Sensitivity Proposition. To address this question, this chapter develops an extension of the Envelope theorem that provides explicit expressions for optimal decision variable sensitivities based on small perturbations of an auxiliary variable. We refer this extension, presented in Section 10.1, as the Perturbation Sensitivity Proposition. This extension is derived by taking additional derivatives and thus assumes additional smoothness on the objective and constraint functions. Although complex, with this extension we are able to compute risk retention sensitivities directly in many cases without the need of additional optimization runs. To demonstrate the usefulness of this proposition, we revisit some of the examples from Chapter 9 and show how to supplement previous results with information about the decision variable sensitivity.

In addition to the examples already discussed, risk managers are eager to understand the robustness of results due to variations in the risk distribution parameters assumptions. However, a marginal distribution parameter affects both risk transfer costs and risk measures. Because this is an instance when the objective and constraint functions both depend on an auxiliary variable, insights from Chapter 9 results alone are difficult to come by. However, as will be seen in Section 10.2, we will be readily be able to handle this type of auxiliary variable with the Perturbation Sensitivity Proposition.

Curvature. To explain why decision variable uncertainty is important for some risk retention problems but not others, Section 10.3 summarizes an investigation into the curvature of the associated Lagrangian. This section shows how to use a quantity known as a condition number from numerical analysis to describe the curvature. From this investigation, we learn that excess of loss retention programs are robust in contrast to quota share programs, which are more susceptible to decision variable uncertainty.

Data Uncertainty. Variations in risk distribution parameters may be due to vagueness of the analyst’s perception of the distribution but may also be due to the uncertainty that arises from the variability in a data generating process. Here, data uncertainty refers to the observation that small changes in the risk distribution parameters can lead to large material changes in optimal risk portfolio coefficients. Section 10.4 provides an introduction to data uncertainty and shows how to calibrate it with what we call a stochastic sensitivity.

10.1 Decision Variable Sensitivities

This section introduces a proposition that can be used to calibrate the sensitivity of the optimal decision variables to changes in exogenous (non-decision) variables. To this end, let us revisit the basic problem in Display (9.2) which introduces a variable, “\(a\)”, as auxiliary into the optimization problem and represents the assumption we wish to vary. In applications developed in Section 10.2, \(a\) primarily represents an element of the risk distribution parameter vector \(\boldsymbol \gamma\) but can also be, for example, the level of an acceptable budget.

Table 10.1 summarizes the notation introduced in Section 9.2 and provides some extensions needed for this development. In particular, we use the notation \(f_0({\bf z},a)\) and \(f_{con,j}({\bf z},a)\) to emphasize that both the objective and constraint functions may depend on \(a\). As before, we require smoothness of the objective and constraint functions in the auxiliary variable, with partial derivatives as displayed in Table 10.1. Recall that there are \(p_z\) decision variables, and we only utilize the \(m\) active constraints at the optimum.

Table 10.1. Perturbation Sensitivity Notation Summary \[ \small{ \begin{matrix} \begin{array}{l|l} \hline \hline \textbf{Description} & \textbf{Mathematical Expression} \\ \hline \textbf{Objective Function} & f_{0}({\bf z},a) \\ \text{Partial Derivatives} \textit{ wrt } \text{decision variables} & f_{0,{\bf z} }({\bf z},a)=\partial_{\bf z} f_{0}({\bf z},a) \\ \text{Partial Derivative} \textit{ wrt } \text{auxiliary variable} & f_{0, a}({\bf z},a)= \partial_{a} f_{0}({\bf z},a) \\ \text{Partial Derivatives} \textit{ wrt } & f_{0,{\bf z} a}({\bf z},a)=\partial_{\bf z} \partial_{a} f_{0}({\bf z},a) \\ ~~~~~~\text{ decision and auxiliary variables} & \\ \text{Hessian of the Objective Function} & f_{0,{\bf z} {\bf z}'}({\bf z},a)=\partial_{{\bf z} {\bf z}'}f_{0}({\bf z},a) \\ \hline \textbf{Constraint Functions} & f_{con,j}({\bf z},a) \\ \text{Partial Derivatives} \textit{ wrt } \text{decision variables} & f_{con,j,{\bf z}}({\bf z},a)=\partial_{\bf z} f_{con,j}({\bf z},a) \\ \text{Partial Derivatives} \textit{ wrt } \text{auxiliary variable} & f_{con,j,a}({\bf z},a)=\partial_{a} f_{con,j}({\bf z},a) \\ \text{Partial Derivative} \textit{ wrt } \text{auxiliary variable} & f_{con,j,{\bf z}}({\bf z},a)=\partial_{\bf z} f_{con,j}({\bf z},a) \\ \hline \textbf{Shortened Lagrangian} & SLA({\bf z},a,{\bf LME}) = f_{0}({\bf z},a)\\ & ~~~~~~+ \sum_{j=1}^m LME_j ~ f_{con,j}({\bf z},a) \\ \text{Partial Derivatives} \textit{ wrt } & SLA_{{\bf z} a}({\bf z},a,{\bf LME}) = \\ ~~~~~~\text{decision and auxiliary variables} & ~~~~~~\partial_{\bf z} \partial_{a}SLA({\bf z},a,{\bf LME}) \\ \text{Hessian of the Shortened Lagrangian} & SLA_{{\bf z} {\bf z}'}({\bf z},a,{\bf LME}) = \\ & ~~~~~~\partial_{\bf z} \partial_{\bf z'} SLA({\bf z},a,{\bf LME}) \\ \hline \end{array} \\ \textit{Note: wrt} \text{ means } ~``\text{with respect to}" \end{matrix} } \] To economize on notation, we collect the constraints using matrix notation as \(f_{con,\bullet,a}({\bf z},a)\) \(=\left[f_{con,1,a}({\bf z},a), \ldots, f_{con,m,a}({\bf z},a)\right]'\), a \(m \times 1\) vector, and \(f_{con,\bullet,{\bf z}}({\bf z},a)\) \(=\left[f_{con,1,{\bf z}}({\bf z},a), \ldots, f_{con,m,{\bf z}}({\bf z},a)\right]\), a \(p_z \times m\) matrix.

To present results, recall the shortened version of the Lagrangian \(SLA({\bf z},a)=\) \(f_0({\bf z},a) + \sum_{j=1}^m LME_j ~f_{con,j}({\bf z},a)\), where \(LME_j\) are Lagrange multipliers, that considers only those constraints that are active at the optimum. Table 10.1 introduces the Hessian of the shortened Lagrangian \(SLA_{{\bf z} {\bf z}'}({\bf z},a,{\bf LME})\), a \(p_z \times p_z\) matrix. In the same way, taking mixed second-order partial derivatives yields the \(p_z \times 1\) vector \(SLA_{{\bf z} a}({\bf z},a,{\bf LME})\).

At the optimum, \({\bf LME}^*(a)={\bf LME}^*\) and \({\bf z}^*(a)={\bf z}^*\) are values of the Lagrange multipliers and the decision variables (both of which depend on \(a\)). Our interest is in the set of sensitivities that we now express as vectors \[ \partial_{a} {\bf z}^* = \left( \begin{array}{c} \partial_{a} z_1^*(a) \\ \vdots \\ \partial_{a} z_{p_z}^*(a) \end{array} \right) ~~~~ \text{and} ~~~ \partial_{a} \mathbf{LME}^* = \left( \begin{array}{c} \partial_{a} LME_1^*(a) \\ \vdots \\ \partial_{a} LME_m^*(a) \end{array} \right) . \] All functions are evaluated at the optimum, denoted using the asterisk \(^*\) notation as \(f_{con, \bullet, a}^*=f_{con, \bullet, a}({\bf z}^*,a)\), \(f_{con, \bullet, {\bf z}}^*=f_{con, \bullet, {\bf z}}({\bf z}^*,a)\), \(SLA_{{\bf z} a}^* =SLA_{{\bf z} a}({\bf z}^*,a,\mathbf{LME}^*)\), and \(SLA_{{\bf z} {\bf z}'}^* =SLA_{{\bf z} {\bf z}'}({\bf z}^*,a,\mathbf{LME}^*)\).

To evaluate the sensitivities, we use the


Perturbation Sensitivity Proposition. Assume that the objective and constraint functions are sufficiently smooth and that the Hessian \(SLA_{{\bf z} {\bf z}'}^*\) is invertible. Then, we have \[\begin{equation} \begin{array}{ll} \partial_{a} \mathbf{LME}^* &= \left\{ f_{con, \bullet, {\bf z}'}^* SLA_{{\bf z} {\bf z}'}^{*-1}~ f_{con, \bullet, {\bf z}}^* \right\}^{-1} \\ &~~~~~~ \left\{f_{con, \bullet, a}^*- f_{con, \bullet, {\bf z} '}^*~ SLA_{{\bf z} {\bf z}'}^{*-1} ~SLA_{{\bf z} a}^* \right\} \end{array} \tag{10.1} \end{equation}\] and \[\begin{equation} \partial_{a} ~{\bf z}^* = -SLA_{{\bf z} {\bf z}'}^{*-1} \left\{SLA_{{\bf z} a}^* + f_{con, \bullet, {\bf z}}^* ~\partial_{a} \mathbf{LME}^* \right\} . \tag{10.2} \end{equation}\]


Equations (10.1) and (10.2) give explicit expressions for the sensitivities.

Discussion. The proofs of equations (10.1) and (10.2) are in Appendix Section 10.5.4 where detailed discussions of precise statements of “sufficiently smooth” may be found based on work in Fiacco (1976) and Fiacco (1983). We also refer to Robinson (1974) and Fiacco and McCormick (1990), Theorem 6, that provide alternative smoothness conditions.

To see how these results may be applied, consider the following special cases.

Example 10.1. \(RTC_{max}\) as an Auxiliary Variable in the ANU Case. Let us continue with the set up in Example 9.3. Here, there is a single active constraint, \(f_{con,1}({\bf z}) = RTC({\bf z}) - RTC_{max}\) and we take as the auxiliary variable \(a = RTC_{max}\). Then, \(f_{con,1, a}({\bf z}) = -1\). Further, \(SLA_{{\bf z} a}({\bf z},a,LME_1)\) \(= f_{0, {\bf z} a}({\bf z},a) + LME_1 ~~ \partial_{{\bf z} a} f_{con,1}({\bf z},a) = 0\). Thus, equations (10.1) and (10.2) reduce to \[ \partial_{RTC_{max}} LME_1^* = -\left\{ f_{con, 1, {\bf z}'}^* SLA_{{\bf z} {\bf z}'}^{*-1}~ f_{con, 1, {\bf z}}^* \right\}^{-1} \] and \[ \partial_{RTC_{max}} ~{\bf z}^* = -SLA_{{\bf z} {\bf z}'}^{*-1} f_{con, 1, {\bf z}}^* ~~\partial_{RTC_{max}} LME_1^* . \] After calculations, it turns out that \(\partial_{RTC_{max}} LME_1^* =\) -0.003 and the first three decision variable sensitivities are \(\partial_{RTC_{max}} z_1^* =\) -1.552, \(\partial_{RTC_{max}} z_2^* =\) -0.972, and \(\partial_{RTC_{max}} z_3^* =\) -1.019. For example, by increasing \(RTC_{max}\) by one unit, we anticipate the optimal value of the upper limit for the first risk, General and Products liability, to decrease by 1.552.

R Code for Example 10.1

Example 10.2. Confidence Level \(\alpha\) as an Auxiliary Variable in the ANU Case. Let us continue with the set up in Example 9.5. As in Example 10.1, there is a single active constraint, \(f_{con,1}({\bf z}) = RTC({\bf z}) - RTC_{max}\) but now we take as the auxiliary variable \(a = \alpha\). Then, \(f_{con,1, \alpha}({\bf z}) = 0\). Further, \(SLA_{{\bf z} a}({\bf z},\alpha,LME_1)= f_{0, {\bf z} \alpha}({\bf z},\alpha)\). Thus, equations (10.1) and (10.2) reduce to \[ \begin{array}{ll} \partial_{\alpha} LME_1^* &= -f_{con, 1, {\bf z} '}^*~ SLA_{{\bf z} {\bf z}'}^{*-1} ~f_{0, {\bf z} \alpha}^*({\bf z},\alpha) / \left\{ f_{con, 1, {\bf z}'}^* SLA_{{\bf z} {\bf z}'}^{*-1}~ f_{con, 1, {\bf z}}^* \right\} \end{array} \] and \[ \partial_{\alpha} ~{\bf z}^* = -SLA_{{\bf z} {\bf z}'}^{*-1} \left\{f_{0, {\bf z} \alpha}^*({\bf z},\alpha) + f_{con, 1, {\bf z}}^* ~\partial_{\alpha} LME_1^* \right\} . \] Recall from Example 9.4, we established \(f_{0\alpha}({\bf z}^*,\alpha) = \frac{1}{1-\alpha}\left\{ES^* - VaR^*\right\}\) that simplifies calculations. With the calculations, it turns out that \(\partial_{\alpha} LME_1^* =\) 7.565. This partial derivative is per unit change in \(\alpha\); it is easier to divide by 100 and think about changes per percentage unit of \(\alpha\). With this scale, the first three decision variable sensitivities are \(\partial_{\alpha} z_1^* =\) 0.02356, \(\partial_{\alpha} z_2^* =\) 0.21466, and \(\partial_{\alpha} z_3^* =\) 0.21271. For example, by increasing \(\alpha\) by one percentage point, we anticipate the optimal value of the upper limit for the first risk, General and Products liability, to increase by 0.02356. From a practical perspective, these changes are small suggesting that risk managers need not be overly concerned when selecting the confidence level.

R Code for Example 10.2

Linear Constraint Corollary. Suppose that the \(j\)th constraint is linear in the decision variables and is of the form \(f_{con,j}({\bf z}) = {\bf p}'_j {\bf z} - p_0 \le 0\), for a constant \(p_0\) and vector \({\bf p}_j\). Then, under the conditions of the Perturbation Sensitivity Proposition, we have \[ {\bf p}'_j ~\partial_a{\bf z}^* = 0 . \] Remark. In particular, note that if an edge constraint such as \(z_j \ge 0\) or \(z_j \le 1\) is active (as will be the case in the subsequent quota share examples), this means that the corresponding sensitivity is zero.

10.2 Risk Retention Sensitivities

Following up on Examples 10.1 and 10.2, this section shows how to use the perturbation sensitivity proposition for the risk retention problem with an expected shortfall objective function \[ f_{0}({\bf z},a) = f_{0}(z_0, \boldsymbol \theta,a) = z_0 + \frac{1}{1-\alpha} \left\{\mathrm{E}[g({\bf X};\boldsymbol \theta) - z_0]_+ \right\} . \] The constraints include a budget constraint \(f_{con,1}({\bf z},a) =RTC(\boldsymbol \theta,a) - RTC_{max} \le 0\) and additional active edge constraints \(f_{con,j}({\bf z},a)\), \(j=2, \ldots, m\), of the form \({\bf P} \boldsymbol \theta \le {\bf p}_0\).

10.2.1 Hessian of the Active Set Lagrangian

Although analysts may not need second derivatives for routine constrained optimization problems, we have seen that it is a key component of the perturbation sensitivity proposition. In addition, Section 10.3 will show that we can get additional insights into the decision variable uncertainty by examining the Hessian of the Lagrangian at the optimum. Recall that the solution to many constrained optimization problems can be determined by minimizing the Lagrangian and so naturally one is interested in the curvature of this function at the optimum. The Hessian describes this curvature (see, for example, Luenberger, Ye, et al. (2016), page 192, or Boyd and Vandenberghe (2004), page 71). The sharper this curvature, the more reliable are results that may be subject to decision variable uncertainty.

Appendix Section 10.5.3 develops the Hessian of the shortened, or active set, Lagrangian as \[\begin{equation} SLA_{{\bf z} {\bf z}'}({\bf z},{\bf LME}) = {\tilde {\bf X}}' {\tilde {\bf X}} + \sum_{j=1}^m LME_j ~ \partial_{\bf z}\partial_{\bf z'} f_{con,j}({\bf z}) , \tag{10.3} \end{equation}\] with \({\tilde {\bf X}} = \left( \begin{array}{cc}\sqrt{{\bf W}_{\bf z}}~{\bf 1} & -\sqrt{{\bf W}_{\bf z}}~{\bf G}\\ \end{array} \right)\), the gradients \({\bf G} = [\partial_{\boldsymbol \theta} g({\bf X}_1; \boldsymbol \theta), \ldots, \partial_{\boldsymbol \theta} g({\bf X}_R; \boldsymbol \theta)]'\), and a diagonal matrix, \(\sqrt{{\bf W}_{\bf z}} = diag(\sqrt{w_1({\bf z})} , \ldots, \sqrt{w_R({\bf z})}~)\) having weights \(w_r({\bf z}) =k\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right)/[(1-\alpha)Rb]\). The first matrix on the right hand side of equation (10.3), \({\tilde {\bf X}}' {\tilde {\bf X}}\), is the Hessian of the objective function, \(f_{0,{\bf z} {\bf z}'}({\bf z})\).

What is the role of the constraint functions? The edge constraints of the form \({\bf P} \boldsymbol \theta \le {\bf p}_0\) are linear in the risk retention/decision variables. Hence, their second derivative is zero and they do not appear in the expression for \(SLA_{{\bf z} {\bf z}'}\). Thus, one only needs to consider the budget constraint for which we have \(\partial_{\bf z}\partial_{\bf z'} f_{con,1}({\bf z})\) \(=\partial_{\bf z}\partial_{\bf z'} RTC(\boldsymbol \theta,a)\). From this, it is easy to see that for the basic quota share problem, the second term on the right-hand side of equation (10.3) is a matrix of zeros. In contrast, for the excess of loss problem, this term is a diagonal matrix \(LME_1~\partial_{\bf z}\partial_{\bf z'} f_{con,1}({\bf z})\) \(=LME_1~\boldsymbol \Lambda\), where \(\boldsymbol \Lambda =diag\left[0, f_1(\theta_1), \ldots, f_p(\theta_p) \right]\).

The expression in equation (10.3) provides a handy computational form that was already used in Examples 10.1 and 10.2 and will be used again in examples that follow.

10.2.2 Risk Distribution Parameter Sensitivities - Bivariate Excess of Loss

Example 10.3. Parameter Sensitivities for Excess of Loss with Two Risks. Let us continue from Example 7.2, a bivariate excess of loss problem where \(X_1\) follows a gamma distribution and \(X_2\) follows a Pareto distribution. Refer to Example 7.2 for specific parameter values of the distributions.

This problem has been solved four times with different objective functions: one for standard deviation and three for the expected shortfall with confidence levels \(\alpha = 0.95, 0.85, 0.75\). Table 10.1 summarizes results, with each objective function presented in a separate column. As before, once an objective function has been selected and optimal values of retention parameters have been determined, all other risk measures can be evaluated at the optimum. This allows for comparison of results from different objective functions.

R Code to Develop Baseline Optimizations
Table 10.1: Excess of Loss: Baseline Results from Four Optimization Problems
Std Dev \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Optimal \(u_1\) 5064.60 5480.40 5364.56 5092.53
Optimal \(u_2\) 1782.48 1210.94 1336.12 1730.78
Optimal \(LME_1\) 1.13 4.14 3.66 2.95
Std Dev 1906.24 1927.92 1918.27 1906.35
\(ES:\) 0.95 6847.07 6691.33 6700.68 6823.31
\(ES:\) 0.85 6700.66 6655.38 6648.79 6690.92
\(ES:\) 0.75 6375.39 6429.27 6402.87 6375.06

In this example, given the long tail of the Pareto distribution, we initially investigate the sensitivity to the scale of the Pareto distribution. Thus, we represent the second risk as \(X_2(a) = X_2(1+\frac{a}{100})\). This formulation allows us to consider small changes in the auxiliary variable \(a\) in percentage terms.

All other aspects of the problem remain unchanged. For bivariate excess of loss, the risk owner retains \(S(u_1,u_2,a) = X_1 \wedge u_1 + X_2(a) \wedge u_2\). The budget requirement (assumed to be binding) is represented as \(f_{con,1}(u_1,u_2,a)\) \(= RTC(u_1,u_2,a) - RTC_{max}\le 0\) where, for simplicity, we use a fair risk transfer cost \(RTC(u_1,u_2,a) =\) \(\mathrm{E} [S(\infty,\infty, a)] - \mathrm{E}[S(u_1,u_2,a)]\). A maximal risk transfer cost \(RTC_{max} =\) \(0.20 \times \{\mathrm{E}(X_1) + \mathrm{E}(X_2(0)\}\) \(= 1,000\) is used in this example.

For each of the four optimization problems, Table 10.2 shows the risk retention sensitivities \(\partial_a u_1^*(a)\) and \(\partial_a u_2^*(a)\), as well as the Lagrange multiplier sensitivity \(\partial_a LME^*(a)\). Despite the example being designed to only affect small changes in the scale of the Pareto distribution (the second distribution), we still observe changes in the first risk retention parameter \(u^*_1\) and the Lagrange multiplier.

The results were obtained through a relatively straightforward implementation of R code. Following the coding of the objective and budget constraint functions, numerical evaluation of the derivatives and matrix of second derivatives was conducted using the functions grad and hessian, respectively, from the numDeriv package.

R Code to Develop Pareto Scale Sensitivities
Table 10.2: Excess of Loss Sensitivities with Auxiliary \(a\) = Pareto Scale
Std Dev \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) 10.974 11.540 9.160 9.524
Sensitivity \(u_2\) 15.896 16.634 19.333 18.781
Sensitivity \(LME_1\) 0.002 0.018 0.005 0.003

Table 10.3 shows similar results when varying the scale parameter for the first risk (Scale 1).

Table 10.3: Excess of Loss Sensitivities with Auxiliary \(a\) = Gamma Scale
Std Dev \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) 67.529 73.008 66.799 63.307
Sensitivity \(u_2\) 16.671 7.137 14.355 23.919
Sensitivity \(LME_1\) 0.004 0.028 0.010 0.009

Exercise 10.5 outlines follow-up analyses of this very interesting example.


10.2.3 Wisconsin Property Fund Sensitivities

Section 8.1.2 demonstrated a comprehensive optimization analysis for a specific fund member (number 2), presenting optimal risk retention parameters, risk transfer costs, and risk measures across a range of maximal risk transfer costs; see Table 8.3. Supplementary graphs provide an intuitive understanding for the trade-off between risk transfer costs and uncertainty of retained risks.

Naturally, it is possible to present many alternative “what-if” scenarios, examining each assumption underpinning the analysis. However, there were over one thousand fund members when these data were analyzed. An analyst would prefer a set of techniques that can be readily scaled up to multiple analyses, aligning with one of the objectives of the Envelope theorem approach developed here.

To maintain focus, we assume that fund member 2 aims for a maximal risk transfer cost of \(RTC_{max} = 66\), in thousands of dollars. According to Table 8.3, this is near the center of potential costs, representing 55 percent of maximal costs.

Similar to the Section 10.2.2 example, here we investigate the sensitivity to the scale of each variable by employing alternatives \(X_j(a) = X_j (1+\frac{a}{100})\), \(j=1, \ldots, 6\). Table 10.4 shows the results. The first row provides the basic outcomes from the optimization run, without considering alternatives. For the second row, we examine sensitivities for the scale of the first variable. For instance, a 10% increase in the first scale parameter would lead us to anticipate an increase of 13.78 (in thousands) in the optimal \(u_{BC}\). The table also shows we expect the optimal \(u_{IM}\) to increase by 1.88, and similarly for other columns and other rows.

R Code to Develop Wisconsin Base Results
Table 10.4: Fund Member 2 Excess of Loss: Scale Parameter Sensitivities
\(u_{BC}\) \(u_{IM}\) \(u_{CN}\) \(u_{CO}\) \(u_{PN}\) \(u_{PO}\) \(LME\)
\(Base\) 51.672 17.676 26.377 35.479 15.097 9.668 3.664
\(Scale_{BC}\) 1.378 0.188 0.148 0.229 0.091 0.083 -0.108
\(Scale_{IM}\) 0.140 0.223 0.026 0.040 0.016 0.014 -0.019
\(Scale_{CN}\) 0.098 0.023 0.287 0.028 0.011 0.010 -0.013
\(Scale_{CO}\) 0.243 0.057 0.044 0.397 0.027 0.025 -0.032
\(Scale_{PN}\) 0.055 0.013 0.010 0.016 0.150 0.006 -0.007
\(Scale_{PO}\) 0.070 0.016 0.013 0.020 0.008 0.101 -0.009
R Code to Develop Wisconsin Sensitivities

10.2.4 ANU Case Sensitivities

Examples 9.5 and 9.6 discussed the sensitivity of ANU case results to the assumptions of the confidence level and the form of the risk measure. This section follows up these examples by examining the sensitivity of results to the scale assumptions of each risk.

To be specific, we focus on the middle value of the maximal risk transfer cost that is \(RTC_{max}\) = 722, corresponding to the sixth row of Table 8.8.

The sensitivities for scale parameters are given in Table 10.5. This table provides a fairly complete picture; for a partial change of each of the 14 scale parameters, the table summarizes the change in each of the 14 upper limit parameters. As one would expect, changes along the diagonal are generally the largest.

How reliable are these estimates of the sensitivities? One way to assess this is via bootstrapping as shown in Table 10.5. For readers interested in more detail of these analyses and corresponding code, see Frees and Shi (2024).

R Code to Develop ANU Base Results
Table 10.5: ANU Excess of Loss: Scale Parameter Sensitivities
\(u_{2}\) \(u_{3}\) \(u_{4}\) \(u_{5}\) \(u_{6}\) \(u_{7}\) \(u_{8}\) \(u_{9}\) \(u_{10}\) \(u_{11}\) \(u_{12}\) \(u_{13}\) \(u_{14}\) \(u_{15}\)
\(Base\) 334.15 346.73 261.60 246.07 98.32 124.28 185.00 289.59 200.69 191.09 245.61 415.37 183.23 212.82
\(Scale_{2}\) 17.01 5.46 6.04 4.42 0.68 0.00 0.02 5.60 2.48 1.07 0.61 0.10 0.32 0.80
\(Scale_{3}\) 2.21 4.39 1.00 0.73 0.11 0.00 0.00 0.93 0.41 0.18 0.10 0.02 0.05 0.13
\(Scale_{4}\) 2.56 1.05 3.98 0.85 0.13 0.00 0.01 1.08 0.48 0.21 0.12 0.02 0.06 0.15
\(Scale_{5}\) 2.25 0.92 1.02 2.91 0.11 0.00 0.00 0.95 0.42 0.18 0.10 0.02 0.05 0.13
\(Scale_{6}\) 0.11 0.04 0.05 0.04 0.93 0.00 0.00 0.05 0.02 0.01 0.00 0.00 0.00 0.01
\(Scale_{7}\) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
\(Scale_{8}\) 0.00 0.00 0.00 0.00 0.00 0.00 1.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00
\(Scale_{9}\) 2.55 1.04 1.16 0.85 0.13 0.00 0.00 3.84 0.47 0.21 0.12 0.02 0.06 0.15
\(Scale_{10}\) 0.40 0.16 0.18 0.13 0.02 0.00 0.00 0.17 2.14 0.03 0.02 0.00 0.01 0.02
\(Scale_{11}\) 0.19 0.08 0.09 0.06 0.01 0.00 0.00 0.08 0.04 1.84 0.01 0.00 0.00 0.01
\(Scale_{12}\) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.35 0.00 0.00 0.00
\(Scale_{13}\) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.15 0.00 0.00
\(Scale_{14}\) 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.81 0.00
\(Scale_{15}\) 0.03 0.01 0.01 0.01 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 2.00
R Code to Develop ANU Sensitivities
Table 10.6: Bootstrap Standard Errors of Scale Parameter Sensitivities
\(u_{2}\) \(u_{3}\) \(u_{4}\) \(u_{5}\) \(u_{6}\) \(u_{7}\) \(u_{8}\) \(u_{9}\) \(u_{10}\) \(u_{11}\) \(u_{12}\) \(u_{13}\) \(u_{14}\) \(u_{15}\)
\(Scale_{2}\) 0.85 0.73 0.83 0.38 0.08 0 0.10 0.78 0.30 0.09 0.13 0.03 0.06 0.16
\(Scale_{3}\) 0.15 0.29 0.14 0.07 0.01 0 0.01 0.14 0.05 0.02 0.02 0.00 0.01 0.03
\(Scale_{4}\) 0.21 0.15 0.32 0.09 0.02 0 0.02 0.14 0.06 0.01 0.02 0.01 0.01 0.03
\(Scale_{5}\) 0.14 0.13 0.12 0.12 0.01 0 0.01 0.14 0.05 0.01 0.02 0.00 0.01 0.03
\(Scale_{6}\) 0.01 0.01 0.01 0.00 0.09 0 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
\(Scale_{7}\) 0.00 0.00 0.00 0.00 0.00 0 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
\(Scale_{8}\) 0.00 0.00 0.00 0.00 0.00 0 0.57 0.00 0.00 0.00 0.00 0.00 0.00 0.00
\(Scale_{9}\) 0.16 0.16 0.17 0.08 0.02 0 0.01 0.32 0.06 0.02 0.02 0.01 0.01 0.03
\(Scale_{10}\) 0.03 0.03 0.04 0.01 0.00 0 0.00 0.02 0.18 0.00 0.00 0.00 0.00 0.01
\(Scale_{11}\) 0.02 0.01 0.01 0.01 0.00 0 0.00 0.01 0.00 0.12 0.00 0.00 0.00 0.00
\(Scale_{12}\) 0.00 0.00 0.00 0.00 0.00 0 0.01 0.00 0.00 0.00 0.39 0.00 0.00 0.00
\(Scale_{13}\) 0.00 0.00 0.00 0.00 0.00 0 0.01 0.00 0.00 0.00 0.00 0.64 0.00 0.00
\(Scale_{14}\) 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.00
\(Scale_{15}\) 0.01 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29
Bootstrap Standard Errors ANU Sensitivities (with R Code)

10.3 Curvature of the Active Set Lagrangian

As outlined in Section 10.2.1, for excess of loss and quota share insurable risk portfolios, one can express the Hessian of the active, or shortened, Lagrangian as \[\begin{equation} SLA_{{\bf z} {\bf z}'}({\bf z},{\bf LME}) = {\tilde {\bf X}}' {\tilde {\bf X}} + LME_1 \left\{\begin{array}{cl} {\bf 0} & \text{Quota Share} \\ \boldsymbol \Lambda & \text{Excess of Loss} \\ \end{array} \right. . \tag{10.4} \end{equation}\] See also Appendix Section 10.5.3. Equation (10.4) provides not only a handy computational formula for the Hessian of the active set Lagrangian but also one that is amenable to an informative interpretation.

This Hessian can be viewed as a balance between an ill-conditioned matrix (\(f_{0,{\bf z} {\bf z}'}({\bf z})={\tilde {\bf X}}' {\tilde {\bf X}}\)) and a well-conditioned one (\(\boldsymbol \Lambda\)); this balance is a feature of many related problems. Many analysts may draw analogies between a comparison of ordinary least squares and ridge regression (see, for example, Hastie, Tibshirani, and Friedman (2009), Section 3.4.1).

To explore why excess of loss contracts do not suffer from decision variable uncertainty, we revisit the discussion in Section 10.2.1. There, we posited that the curvative of the Hessian of the Lagrangian yield insights into the decision variable uncertainty problem.

Condition Number. There are numerous ways to summarize the curvature of the Hessian, but arguably the most natural choice is the condition number. Based on equations (10.3) and (10.4), we observe that the Hessians we work with are typically symmetric and nonnegative definite. In this scenario, one may interpret the condition number as the ratio of the largest to the smallest eigenvalues. Each eigenvalue summarizes the curvature in the direction of its corresponding eigenvector. Hence, the condition number of the Hessian can be understood as the ratio of the largest to the smallest curvatures at the optimum (see for example, Bierlaire (2018), Section 2.5).

Example 10.4. Property Fund Portfolio Condition Numbers. Frees and Shi (2024) examined the Wisconsin Property Fund similarly to Section 8.1.3, but imposed retention limits at the portfolio level, rather than in a proportional manner as demonstrated in that section. For this application, Table 10.7 displays the condition numbers of the Hessian of the excess of loss objective function as well as the active set Lagrangian, both at the optimum. We report the condition number using the infinity norm. We observe that incorporating the extra term for the excess of loss in equation (10.3) substantially reduces the condition number, implying that analysts can anticipate greater curvature, and thus greater reliability, of the results.

Table 10.7: Comparison of Excess of Loss Condition Numbers
Excess of Loss \(f_{0zz}\) 8.23
Excess of Loss \(SLAzz\) 1.73

Investments and Quota Share Portfolios Condition Numbers. In the investments portfolio problem, analysts use shrinkage estimators to estimate the variance-covariance of returns, employing a combination of (ill-conditioned) data-driven estimators and a more stable estimator such as a market model (see, for example, Ledoit and Wolf (2003), Candelon, Hurlin, and Tokpavi (2012), and Schäfer and Strimmer (2005)). Moreover, in the investments portfolio problem, DeMiguel et al. (2009) suggested adding an additional constraint that the norm of the portfolio-weight vector be smaller than a given threshold. The motivation underpinning this suggestion was to enhance the out-of-sample performance of portfolio performance (see also Ban, El Karoui, and Lim (2018)). As demonstrated in the following, this type of additional constraint can sharpen the curvature of the Lagrangian, reducing the influence of decision variable uncertainty on risk retention uncertainty.

Building on Example 10.4, Table 10.8 displays the condition number of the Hessian of the active set Lagrangian for the quota share contract. Comparing this to the excess of loss contract in Table 10.7, the quota share condition numbers are substantially larger. This can be explained by examining the vector on which the Hessian is based, \({\tilde {\bf X}}\), from Section 10.2.1. For quota share, the \(r\)th row of this vector is \({\sqrt{w_r({\bf z}^*)}}(1, -{\bf X}_r')\), with weight \(w_r({\bf z}^*) =k\left[(z_0^*-\boldsymbol \theta^{*'}{\bf X}_r)/b\right]/[(1-\alpha)Rb]\). By analyzing these weights, we observe that only observations that are nearly linear about the optimum receive substantive weight. Consequently, when examining the columns of \({\tilde {\bf X}}\), one can anticipate them to be nearly a linear combination of one another. Columns that are nearly a linear combination of one another naturally result in an ill-conditioned matrix, creating decision variable uncertainty.

Achieving a balance between ill- and well-conditioned matrices in equation (10.4) involves introducing nonlinear constraint functions. This suggests the consideration of retaining a linear mechanism for sharing risks yet introducing a nonlinear risk transfer cost for pricing. We pursued this approach, following the work of DeMiguel et al. (2009). Specifically, for the quota share problem, we imposed the quadratic constraint \(f_{con}({\bf z})=f_{con}(z_0, {\boldsymbol \theta})\) \(= {\boldsymbol \theta}'{\boldsymbol \theta} - \delta\). With the choice of \(\delta =4\), this constraint was found to be binding at the optimum and was therefore included in the shortened Lagrangian. The bottom two rows of Table 10.8 demonstrate that by incorporating this constraint, one can dramatically sharpen the curvature of the Lagrangian which mitigates the decision variable uncertainty problem.

Table 10.8: Comparison of Quota Share Condition Numbers (in Millions)
Quota Share \(SLA_{zz}=f_{0zz}\) 824.836
Quota Share Quad Constraint \(f_{0zz}\) 958.152
Quota Share Quad Constraint \(SLA_{zz}\) 0.351

Video: Section Summary

10.4 Data Uncertainty and Stochastic Sensitivity

The preceding sections of this chapter established sensitivities based on perturbations of a scalar auxiliary variable \(a\). It is evident that one can extend these sensitivities to an auxiliary vector \(\bf{a}\) - after all, a partial derivative with respect a vector, a gradient, is simply a vector of partial derivatives. For example, an interesting case when \({\bf a} = \boldsymbol \gamma\), that is, the auxiliary vector is a vector of marginal risk parameters, such as those that account for the scale and shape of a distribution. Chapter 12 will delve into another interesting choice where \(\bf a\) represents a collection of dependence parameters.

Like all other assumptions, the optimal decision variables depend on the set of marginal risk parameters, denoted as \({\bf z}^*(\boldsymbol \gamma)\). Associated with each parameter, one may use equations (10.1) and (10.2) to compute perturbation sensitivities \(\nabla {\bf z}^*(\boldsymbol \gamma)\) \(= \partial_{\boldsymbol \gamma} ~{\bf z}^*(\boldsymbol \gamma)\), a matrix of partial derivatives.

Data Uncertainty. Risk distribution parameters may be imprecise due to either simulation uncertainty or sampling (of data) uncertainty, although other sources are clearly possible. In applications, the full distribution may be unknown, but it is common to have available a sample of \(data = \{{\bf X}_1, \ldots, {\bf X}_n\}\) from which one computes distribution parameter estimates \(\hat{\boldsymbol \gamma}=\boldsymbol \gamma(data)\). From this, one computes the optimal decision variables \(\hat{{\bf z}}^*={\bf z}^*(\hat{\boldsymbol \gamma})\). The difficulty arises because small changes in distribution parameters \(\hat{\boldsymbol \gamma}\) can lead to significant changes in decision variables \(\hat{{\bf z}}^*\); I refer to this difficulty as data uncertainty.

To illustrate, one could consider a size \(n\) sample of risks from a parametric distribution. With the usual estimators of parameters, e.g., maximum likelihood, we can rely on standard methods to quantify uncertainty in the distribution parameter estimates. As is customary, we can invoke the central limit theorem and assume that \(\sqrt{n}\left(\hat{\boldsymbol \gamma} - \boldsymbol \gamma \right)\) has an approximate normal distribution with vector mean \({\bf 0}\) and variance-covariance matrix, say, \(\boldsymbol \Sigma_{\boldsymbol \gamma}\). With this, we can quantify the data uncertainty in the optimal decision variables using a Taylor-series approximation.

Stochastic Sensitivity Corollary. Suppose that \(\sqrt{n}\left(\hat{\boldsymbol \gamma} - \boldsymbol \gamma \right)\to_D ~ N({\bf 0}, \boldsymbol \Sigma_{\boldsymbol \gamma})\). Then, under mild smoothness conditions, \[\begin{equation} \sqrt{n}\left[{\bf z}^*(\hat{\boldsymbol \gamma}) - {\bf z}^*(\boldsymbol \gamma) \right]\to_D ~ N{\Large (}{\bf 0}, ~\nabla {\bf z}^*(\boldsymbol \gamma)\boldsymbol \Sigma_{\boldsymbol \gamma}\nabla {\bf z}^*(\boldsymbol \gamma)' {\Large )} . \tag{10.5} \end{equation}\] With the asymptotic distribution, one can readily compute the standard errors of (estimated) optimal decision variables. This provides another way to interpret the sensitivities.

aaa

Example 10.5. Property Fund Portfolio Stochastic Sensitivities. This is a continuation from Example 10.4. In the work of Frees and Shi (2024), a single sample of size \(n=50\) was taken, and standard parametric likelihood analysis was used to estimate the parameters as well as their asymptotic variance. From the estimated parameters, the optimal risk retention parameters were determined and are presented in the first column of Table 10.9. To assess their reliability, the stochastic sensitivity corollary in equation (10.5) was employed to calculate the corresponding standard errors, which are given in the second column of Table 10.9.

To evaluate the accuracy of this asymptotic approximation, a summary of a bootstrap analysis is provided in columns three and four of Table 10.9. For these columns, 100 bootstrap resamples of the single sample were taken. For each resample, an optimal retention portfolio was fitted. Columns three and four of Table 10.9 provide the mean and standard deviations of this procedure. The bootstrap analysis relies on fewer assumptions at the expense of additional computation time. For this application, the results are consistent with one another, instilling confidence in risk portfolio managers regarding their reliability.

Table 10.9: Stochastic Sensitivities for Excess of Loss
Stochastic Sens
Bootstrap
Dec Var Std Error Mean Std Dev
\(VaR\) 8611.9 143.2 8604.8 124.4
\(u_1^*\) 3988.3 83.8 3979.4 86.0
\(u_2^*\) 1201.1 41.0 1209.4 72.3
\(u_3^*\) 913.1 24.7 916.3 30.8
\(u_4^*\) 1628.0 55.1 1628.4 46.6
\(u_5^*\) 676.3 23.4 692.4 51.0
\(u_6^*\) 582.9 21.9 582.6 19.6
Source: Frees and Shi (2024)

10.5 Supplemental Materials

10.5.1 Further Resources and Reading

This chapter is based on the work in Frees and Shi (2024).

10.5.2 Exercises

Section 10.1 Exercises

Exercise 10.1. Quota Share. Use the Perturbation Sensitivity Proposition to establish the results in Exercise 9.10.

Show Exercise 10.1 Solution

Exercise 10.2. Quota Share . Use the same set-up as in Exercise 10.1 except now take the auxiliary variable is \(\sigma =a\), where \(\sigma\) is an element of \(\boldsymbol \Sigma\). Use the Perturbation Sensitivity Proposition to establish the results in Exercise 9.11.

Show Exercise 10.2 Solution

Exercise 10.3. Asset Allocation. Use the Perturbation Sensitivity Proposition to establish the results in Section 9.4.1.

Show Exercise 10.3 Solution

Exercise 10.4. Asset Allocation . As a follow-up to Exercise 10.3, now the interest is in sensitivity due to the uncertainty parameters \(\sigma\). Use the Perturbation Sensitivity Proposition to establish the results in Exercise 9.12.

Show Exercise 10.4 Solution

Section 10.2 Exercises

Exercise 10.5. Excess of Loss \(VaR\) Sensitivity. Continue with Example 10.3.

a. For some new baseline optimizations, determine the optimal upper limits use \(VaR\) as a risk measure, for \(\alpha =0.95, 0.85, 0.75\). Provide code to replicate the work in Table 10.10.

Table 10.10: Excess of Loss: Baseline Results from Seven Optimization Problems
Std Dev \(VaR:\) 0.95 \(VaR:\) 0.85 \(VaR:\) 0.75 \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Optimal \(u_1\) 5064.60 5480.40 4985.57 4974.50 5480.40 5364.56 5092.53
Optimal \(u_2\) 1782.48 1210.94 1944.56 1969.36 1210.94 1336.12 1730.78
Optimal \(LME\) 1.13 4.14 2.56 0.00 4.14 3.66 2.95
Std Dev 1906.24 1927.92 1907.35 1907.70 1927.92 1918.27 1906.35
\(VaR:\) 0.95 6847.07 6691.33 6930.12 6943.86 6691.33 6700.68 6823.31
\(VaR:\) 0.85 6208.36 6400.16 6174.11 6169.40 6400.16 6344.77 6220.65
\(VaR:\) 0.75 5615.22 5819.67 5573.30 5567.30 5819.67 5765.06 5629.62
\(ES:\) 0.95 6847.07 6691.33 6930.12 6943.86 6691.33 6700.68 6823.31
\(ES:\) 0.85 6700.66 6655.38 6734.60 6740.14 6655.38 6648.79 6690.92
\(ES:\) 0.75 6375.39 6429.27 6379.93 6381.02 6429.27 6402.87 6375.06
Show R Code Solution to Developing Baseline Optimizations

b. Using \(nsim = 2,000,000\) simulated realizations of losses, determine the risk retention and Lagrange multiplier sensitivities for each of the seven optimization problems in part (a). Your results should be comparable to those in Table 10.11.

Table 10.11: Excess of Loss Sensitivities with Auxiliary a = Scale 2
Std Dev VaR: 0.95 \(VaR:\) 0.85 \(VaR:\) 0.75 \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) 10.919 11.540 14.202 -167.924 11.540 9.472 9.466
Sensitivity \(u_2\) 16.002 16.634 7.704 420.143 16.634 18.968 18.884
Sensitivity \(LME\) 0.002 0.018 -0.008 0.168 0.018 0.005 0.003
Show R Code Solution to Developing Scale Two Sensitivities

c. You note that the results for the \(VaR\) optimization problems seem to be a bit counter-intuitive. To check their accuracy, repeat part (b) 10 times and calculate simulation standard errors. Summarize the mean, or average, of these 10 replications as in Table 10.12 and the standard error as in Table 10.13.

Table 10.12: Mean Excess of Loss Sensitivities with Auxiliary a = Scale 2
Std Dev \(VaR:\) 0.95 \(VaR:\) 0.85 \(VaR:\) 0.75 \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) 10.75 11.54 -2.54 16.26 11.54 9.11 9.57
Sensitivity \(u_2\) 16.32 16.63 44.79 2.69 16.63 19.39 18.69
Sensitivity \(LME\) 0.00 0.02 0.01 -0.01 0.02 0.00 0.00
Table 10.13: Standard Error Excess of Loss Sensitivities with Auxiliary a = Scale 2
Std Dev \(VaR:\) 0.95 \(VaR:\) 0.85 \(VaR:\) 0.75 \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) 0.1890 0 12.5151 20.5931 0 0.1698 0.2093
Sensitivity \(u_2\) 0.3588 0 27.7274 46.6745 0 0.1985 0.3776
Sensitivity \(LME\) 0.0001 0 0.0608 0.0490 0 0.0003 0.0002
Show R Code Solution to Developing Simulation Standard Errors for Scale Two Sensitivities

d. One can use the part (b) strategy to investigate the sensitivity of shape parameters. For example, suppose that the shape parameter for the first risk is the auxiliary variable of interest. Your results should be comparable to those in Table 10.14.

Table 10.14: Excess of Loss Sensitivities with Auxiliary a = Shape 1
Std Dev \(ES:\) 0.95 \(ES:\) 0.85 \(ES:\) 0.75
Sensitivity \(u_1\) -6.581 1119.928 -257.704 -1.605
Sensitivity \(u_2\) 49.234 -1090.854 331.324 38.856
Sensitivity \(LME\) 0.006 0.027 1.422 -0.004

10.5.3 Appendix. Kernel Smoothing of Expectations, with Derivatives

We start with a function \(h(\cdot)\) and use a kernel density estimator \(k(\cdot)\) to define \(\mathrm{E}_{Rk}[h(Y)]\). Then with a change of variables \(y = (\tilde{y} - Y_r)/b\). Thus, we can approximate \(\mathrm{E}[h(Y)]\) using \[ \begin{array}{ll} \mathrm{E}_{Rk}[h(Y)] &= \int h(\tilde{y}) f_{Rk}(\tilde{y}) d\tilde{y} \\ &=\frac{1}{R~b} \sum_{r=1}^R \int h(\tilde{y}) k\left(\frac{\tilde{y} - Y_r}{b}\right) d\tilde{y} \\ &={\LARGE \int} \left\{ \frac{1}{R} \sum_{r=1}^R h[Y_r + by] \right\} ~k(y) dy \\ &={\LARGE \int} \left\{ \mathrm{E}_{R} h[Y_r + by] \right\} ~k(y) dy .\\ \end{array} \] Expected Shortfall. Applying this result to the expected shortfall auxiliary function, we get \[ \begin{array}{ll} ES1_{Rk}(z_0, \boldsymbol \theta) &= z_0 + \frac{1}{1-\alpha} \left\{\mathrm{E}_{Rk}[g({\bf X}; \boldsymbol \theta)] -\mathrm{E}_{Rk}[g({\bf X}; \boldsymbol \theta) \wedge z_0]\right\} \\ &= z_0 + \frac{1}{1-\alpha} \left\{\mathrm{E}_{Rk}[g({\bf X}; \boldsymbol \theta) - z_0]_+\right\} \\ &=z_0 + \frac{1}{1-\alpha}\frac{1}{R} \sum_{r=1}^R \int [g({\bf X}_r; \boldsymbol \theta) + by -z_0]_+ ~k(y) dy \\ &=z_0 + \frac{1}{1-\alpha} {\LARGE \int} \left\{ \frac{1}{R} \sum_{r=1}^R [g({\bf X}_r; \boldsymbol \theta) + by -z_0]_+ \right\} ~k(y) dy .\\ \end{array} \] Expected Shortfall Gradients. Now note that \[ \begin{array}{lll} &\partial_{z_0}~ [g({\bf X}_r; \boldsymbol \theta) + by -z_0]_+ &= -I\left(y >[z_0-g({\bf X}_r; \boldsymbol \theta)]/b\right) \\ \text{and} \\ &\partial_{\boldsymbol \theta}~ [g({\bf X}_r; \boldsymbol \theta) + by -z_0]_+ &= I\left(y >[z_0-g({\bf X}_r; \boldsymbol \theta)]/b\right) \partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) .\\ \end{array} \] Using the observation that \(\int I(y > c) k(y) dy = 1 - K(c)\) for a constant \(c\), we have the partial derivatives \[ \partial{z_0}~ES1_{Rk}(z_0, \boldsymbol \theta) =1 - \frac{1}{1-\alpha}\frac{1}{R} \sum_{r=1}^R \left\{1-K\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right) \right\} \] and \[ \partial_{\boldsymbol \theta}~ES1_{Rk}(z_0, \boldsymbol \theta) =\frac{1}{1-\alpha}\frac{1}{R} \sum_{r=1}^R \left\{1-K\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right)\right\} \partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) , \] which is sufficient for the result.

Expected Shortfall Hessian. The Hessian can be written as \[ \partial_{\bf z}\partial_{\bf z'}~ES1_{Rk}({\bf z}) = \left( \begin{array}{cc} \partial_{z_0} \partial_{z_0}~ES1_{Rk}(z_0, \boldsymbol \theta) & \partial_{z_0} \partial_{\boldsymbol \theta'}~ES1_{Rk}(z_0, \boldsymbol \theta) \\ \partial_{\boldsymbol \theta} \partial_{z_0}~ES1_{Rk}(z_0, \boldsymbol \theta) & \partial_{\boldsymbol \theta} \partial_{\boldsymbol \theta'}~ES1_{Rk}(z_0, \boldsymbol \theta) \end{array} \right) . \] For second derivatives, we have \[ \begin{array}{ll} \partial_{z_0z_0}~ES1_{Rk}(z_0) &= \frac{1}{1-\alpha}\frac{1}{Rb} \sum_{r=1}^R \left\{k\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right) \right\} \\ \partial_{z_0\boldsymbol \theta}~ES1_{Rk}(z_0) &= -\frac{1}{1-\alpha}\frac{1}{Rb} \sum_{r=1}^R \left\{k\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right) \partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) \right\} \\ \partial_{\boldsymbol \theta \boldsymbol \theta'}~ES1_{Rk}(z_0) &=\frac{1}{1-\alpha}\frac{1}{R} \sum_{r=1}^R \left[\left\{1-K\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right)\right\} \partial_{\boldsymbol \theta\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta) \right.\\ &~~~~~~~~~~~~~~~~~~+\frac{1}{b}\left.k\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right) \partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) ~ \partial_{\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta) \right] .\\ \end{array} \] Note that for both the excess of loss and quota share special cases, we have \(\partial_{\boldsymbol \theta\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta) = {\bf 0}\).

To simplify these expressions, define the weight function \(w_r({\bf z}) =k\left(\frac{z_0-g({\bf X}_r; \boldsymbol \theta)}{b}\right)/[(1-\alpha)Rb]\). Assuming \(\partial_{\boldsymbol \theta\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta)\) \(= {\bf 0}\), we may write \[ f_{0,{\bf z} {\bf z}'}({\bf z}) = \sum_{r=1}^R w_r({\bf z}) \left( \begin{array}{cc} 1 &-\partial_{\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta) \\ -\partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) & \partial_{\boldsymbol \theta} g({\bf X}_r; \boldsymbol \theta) \partial_{\boldsymbol \theta'} g({\bf X}_r; \boldsymbol \theta) \\ \end{array} \right) . \] Further define \({\bf W}_{\bf z} = diag(w_1({\bf z}), \ldots, w_R({\bf z}))\) and \({\bf G} = [\partial_{\boldsymbol \theta} g({\bf X}_1; \boldsymbol \theta), \ldots, \partial_{\boldsymbol \theta} g({\bf X}_R; \boldsymbol \theta)]'\). With this notation, we may write \[ f_{0,{\bf z} {\bf z}'}({\bf z}) = \left( \begin{array}{cc} {\bf 1' W_z 1} &-{\bf 1'W_z G} \\ -{\bf G 'W_z 1} & {\bf G' W_z G} \\ \end{array} \right) . \] From this, we may write the Hessian of the active set Lagrangian as \[ SLA_{{\bf z} {\bf z}'}({\bf z},{\bf LME}) = \left( \begin{array}{cc} {\bf 1' W_z 1} &-{\bf 1' W_z G} \\ -{\bf G' W_z 1} & {\bf G' W_z G} \\ \end{array} \right) + LME_1 \left\{\begin{array}{cl} {\bf 0} & \text{Quota Share} \\ \boldsymbol \Lambda & \text{Excess of Loss} \\ \end{array} \right. , \] where \(\boldsymbol \Lambda = diag\left[0, f_1(\theta_1), \ldots, f_p(\theta_p) \right]\).

To interpret the curvature of the Hessian, it is also helpful to define \[ {\tilde {\bf X}} = \left( \begin{array}{cc} \sqrt{{\bf W}_{\bf z}}~{\bf 1} & -\sqrt{{\bf W}_{\bf z}}~{\bf G} \\ \end{array} \right) \] where we define \(\sqrt{{\bf W}_{\bf z}} = diag(\sqrt{w_1({\bf z})}, \ldots,\sqrt{ w_R({\bf z})})\). With this, one can write \[ SLA_{{\bf z} {\bf z}'}({\bf z},{\bf LME}) = {\tilde {\bf X}}' {\tilde {\bf X}} + LME_1 \left\{\begin{array}{cl} {\bf 0} & \text{Quota Share} \\ \boldsymbol \Lambda & \text{Excess of Loss} \\ \end{array} \right. . \] This verifies equation (10.3).

10.5.4 Appendix. Proof of the Perturbation Sensitivity Proposition

The proof is based on results from Fiacco (1976) and Fiacco (1983). To keep the presentation self-contained, we begin by restating relevant results from these works using the notation of this paper. For simplicity, we ignore inequality constraints that are not binding at the optimum and use only constraints that are binding at the optimum.

At this stage, we consider a general objective and constraint functions that need not be related to risk retention. The constrained optimization problem is \[\begin{equation} \boxed{ \begin{array}{ccc} {\small \text{minimize}}_{\bf z} & f_0({\bf z}, a) \\ {\small \text{subject to}} & ~~~~~f_{con,j}({\bf z},a) \le 0 &j=1, \ldots, m . \end{array} } \tag{10.6} \end{equation}\] In this formulation, the auxiliary variable \(a\) is perturbed about 0. The corresponding (shortened) Lagrangian is \[ SLA({\bf z}, a) = f_0({\bf z}, a) + \sum_{j=1}^m LME_j f_{con,j}({\bf z},a) . \] We need the following two sets of conditions that appear in Fiacco (1976), Lemma 2.1.

First Order Karush-Kuhn-Tucker Conditions. \[ \boxed{ \begin{array}{cl} SLA_{{\bf z}}^* &= f_{0,{\bf z}}^* + \sum_{j=1}^m LME_j^* ~ f_{con,j,{\bf z}}^* \\ \partial_{z_j} f_{con,j}^* & \le 0 ~~~~~~(j=1, \ldots, m)\\ LME_j^* ~ \partial_{z_j} f_{con,j}^* & = 0 ~~~~~~(j=1, \ldots, m) . \end{array} } \] Second Order Conditions. \[ \boxed{ \begin{array}{cll} {\bf y}' ~SLA_{{\bf z} {\bf z}'}^* {\bf y} & >0 & \text{for all } {\bf y}\ne {\bf 0} \text{ such that}\\ {\bf y}' f_{con,j,{\bf z}}^* & \le 0 & (j=1, \ldots, m) ~~~\text{where } f_{con,j}^*=0\\ {\bf y}' f_{con,j,{\bf z}}^* & = 0 & (j=1, \ldots, m) ~~~\text{where } LME_j^* > 0 . \end{array} } \] The following result provides first-order sensitivity results for a second order local minimizing point.

Theorem 2.1 and Corollary 2.1 of Fiacco (1976). Suppose

  • the functions \(f_0({\bf z}, a)\) and \(f_{con,j}({\bf z},a)\), \(j=1, \ldots, m\) are twice continuously differentiable in \(({\bf z}, a)\) in a neighborhood of \(({\bf z}^*, 0)\)
  • the Lemma 2.1 first order \(KKT\) and second order conditions hold,
  • for \(a\) near 0, the gradients \(\partial_{z_j} f_{con,j}({\bf x}^*,a)\) are linearly independent, and
  • \(LME_j^* > 0\) when \(\partial_{z_j} f_{con,j}^* = 0\), \(j=1, \ldots, m\) (strict complementary slackness).

Then, \[\begin{equation} {\small \begin{array}{ll} &\left(\begin{array}{c} \partial_{a} {\bf z}^* \\ \partial_{a} \mathbf{LME}^* \end{array} \right) = \\ & ~~~~~~~ \left(\begin{array}{ccccc} SLA_{{\bf z} {\bf z}'}^*& f_{con,1,{\bf z}}^* & \cdots &f_{con,m,{\bf z}}^*\\ -LME_1^* ~ f_{con,1,{\bf z}'}^* & 0 & \cdots & 0 \\ \vdots & & \ddots &\vdots\\ -LME_m^* ~ f_{con,m,{\bf z}'}^* & 0 & \cdots &0 \\ \end{array} \right)^{-1} \left(\begin{array}{c} - SLA_{{\bf z},a}^*\\ LME_1^* ~ f_{con,1,a}^* \\ \vdots \\ LME_m^* ~ f_{con,m,a}^* \\ \end{array} \right) . \end{array} \tag{10.7} } \end{equation}\]


Proof of the Perturbation Sensitivity Proposition. We now use equation (10.7) to establish the perturbation sensitivity results. As in Fiacco (1983) (Section 4.2), we define the \(m \times m\) diagonal matrix \({\bf D_{LME}}^* = diag(LME_1^*, \ldots, LME_m^*)\). We also recall \(f_{con,\bullet,{\bf z}}({\bf z},a)\) \(=\left[f_{con,1,{\bf z}}({\bf z},a), \ldots, f_{con,m,{\bf z}}({\bf z},a)\right]\), a \(p_z \times m\) gradient matrix, that when evaluated at the optimum is \(f_{con,\bullet,{\bf z}}^*\). With this notation, equation (10.7) yields \[\begin{equation} {\small \begin{array}{ll} & \left(\begin{array}{c} \partial_{a} {\bf z}^* \\ \partial_{a} \mathbf{LME}^* \end{array} \right) \\ &~~~~ = \left[ \left(\begin{array}{cc} {\bf I}_{p_z} & {\bf 0} \\ {\bf 0} & {\bf D_{LME}}^* \end{array} \right) \left(\begin{array}{ccccc} SLA_{{\bf z} {\bf z}'}^*& f_{con,1,{\bf z}}^* & \cdots &f_{con,1,{\bf z}}^*\\ -f_{con,1,{\bf z}'}^* & 0 & \cdots & 0 \\ \vdots & & \ddots &\vdots\\ -~ f_{con,m,{\bf z}'}^* & 0 & \cdots &0 \\ \end{array} \right) \right]^{-1} \\ & ~~~~~~~~~~~~\left(\begin{array}{cc} {\bf I}_{p_z} & {\bf 0} \\ {\bf 0} & {\bf D_{LME}}^* \end{array} \right) \left(\begin{array}{c} - SLA_{{\bf z},a}^*\\ f_{con,1,a}^* \\ \vdots \\ f_{con,m,a}^* \\ \end{array} \right) \\ \\ &~~~~ = \left[ \begin{array}{ccccc} SLA_{{\bf z} {\bf z}'}^* & f_{con,\bullet,{\bf z}}^*\\ -f_{con,\bullet,{\bf z}'}({\bf z}^*,a) & {\bf 0} \\ \end{array} \right]^{-1} \left(\begin{array}{c} -SLA_{{\bf z},a}^*\\ f_{con,\bullet,a}^* \\ \end{array} \right) . \tag{10.8} \end{array} } \end{equation}\] From a standard result on block matrices, we use \[ \left( \begin{array}{cc} {\bf A}_{11} & {\bf A}_{12}\\ {\bf A}_{21} & {\bf A}_{22} \\ \end{array} \right)^{-1} = \left( \begin{array}{cc} {\bf A}_{11}^{-1} + {\bf A}_{11}^{-1} {\bf A}_{12}{\bf A}_{22\cdot 11}^{-1}{\bf A}_{21}{\bf A}_{11}^{-1} & -{\bf A}_{11}^{-1} {\bf A}_{12}{\bf A}_{22\cdot 11}^{-1}\\ -{\bf A}_{22\cdot 11}^{-1}{\bf A}_{21}{\bf A}_{11}^{-1} & {\bf A}_{22\cdot 11}^{-1} \\ \end{array} \right) , \] that requires \({\bf A}_{11}\) to be invertible but not \({\bf A}_{22}\). Here, \({\bf A}_{22\cdot 11} = {\bf A}_{22} - {\bf A}_{21} {\bf A}_{11}^{-1}{\bf A}_{12}\).

Use \(SLA_{{\bf z} {\bf z}'}^*\) for \({\bf A}_{11}\), \(f_{con,\bullet,{\bf z}}^*\) for \({\bf A}_{12}\), \(-f_{con,\bullet,{\bf z}'}^*\) for \({\bf A}_{21}\),and \(\bf 0\) for \({\bf A}_{22}\). Further define \({\bf M} = f_{con,\bullet,{\bf z}'}^{*} ~SLA_{{\bf z} {\bf z}'}^{*-1} ~f_{con,\bullet,{\bf z}}^*\) for \({\bf A}_{22\cdot 11}\).

From the second row of blocks in Display (10.8), we have \[ \begin{array}{ll} \partial_{a} \mathbf{LME}^* &= {\bf A}_{22\cdot 11}^{-1} \left\{-{\bf A}_{21}{\bf A}_{11}^{-1} (-SLA_{{\bf z},a}^* )+ (f_{con,\bullet,a}^*) \right\}\\ &= {\bf M}^{-1} \left\{f_{con,\bullet,a}^* - f_{con,\bullet,{\bf z}}^{*} ~SLA_{{\bf z} {\bf z}'}^{*-1} ~SLA_{{\bf z},a}^* \right\} , \end{array} \] which is sufficient for equation (10.1).

From the first row of blocks in Display (10.8), we have \[ \begin{array}{ll} &\partial_{a} ~{\bf z}^* \\ &~~~~= \left\{ {\bf A}_{11}^{-1} + {\bf A}_{11}^{-1} {\bf A}_{12}{\bf A}_{22\cdot 11}^{-1}{\bf A}_{21}{\bf A}_{11}^{-1} \right\}(-SLA_{{\bf z},a}^*) \\ &~~~~~~~~~~~~~~~~~~~~ -\left\{ {\bf A}_{11}^{-1} {\bf A}_{12}{\bf A}_{22\cdot 11}^{-1} \right\}(f_{con,\bullet,a}^*)\\ &~~~~= -SLA_{{\bf z} {\bf z}'}^{*-1} SLA_{{\bf z},a}^*+ SLA_{{\bf z} {\bf z}'}^{*-1} f_{con,\bullet,{\bf z}}^{*}{\bf M}^{-1}f_{con,\bullet,{\bf z}'}^{*} SLA_{{\bf z} {\bf z}'}^{*-1} SLA_{{\bf z},a}^* \\ &~~~~~~~~~~~~~~~~~~~~ -SLA_{{\bf z} {\bf z}'}^{*-1} f_{con,\bullet,{\bf z}}^{*}{\bf M}^{-1} f_{con,\bullet,a}^*\\ &~~~~=SLA_{{\bf z} {\bf z}'}^{*-1} \left\{ -SLA_{{\bf z},a}^*+ f_{con,\bullet,{\bf z}}^{*}{\bf M}^{-1} \left[f_{con,\bullet,{\bf z}}^{*} SLA_{{\bf z} {\bf z}'}^{*-1} SLA_{{\bf z},a}^* - f_{con,\bullet,a}^* \right] \right\}\\ &~~~~= -SLA_{{\bf z} {\bf z}'}^{*-1} \left\{ SLA_{{\bf z},a}^* + f_{con,\bullet,{\bf z}}^{*} ~ \partial_{a} \mathbf{LME}^* \right\} , \\ \end{array} \] which is sufficient for equation (10.2).


Proof of the Linear Constraint Corollary. It is straightforward to multiply each side of equation (10.7) by the inverse of first matrix on the right hand side and, examining the bottom \(m\) rows, get \[ -LME_j^* ~ f_{con,j,{\bf z}'}^* ~ \partial_{a} {\bf z}^* = LME_j^* ~ f_{con,j,a}^* , \] for \(j=1, \ldots, m\).

Now consider a linear constraint of the form \(f_{con,j}({\bf z}) = {\bf p}'_j {\bf z} - p_0 \le 0\), for a constant \(p_0\) and vector \({\bf p}_j\). For partial derivatives, we have \(f_{con,j,a}({\bf z})=0\) and \(f_{con,j,{\bf z}}({\bf z})= {\bf p}_j\). At the optimum \(f_{con,j,a}({\bf z}^*)=0\) and by the strict complementary slackness, the corresponding Lagrange multiplier is \(LME_j^* >0\).

Thus, with the prior expression we can cancel out \(LME_j^*\), to get \[ \begin{array}{lll} 0&=-f_{con,j,a}({\bf z}^*) = f_{con,j,{\bf z}'}({\bf z}^*)\partial_{a} {\bf z}^*(a) \\ &= {\bf p}'_j ~\partial_{a} {\bf z}^*(a), \end{array} \] which is sufficient for the result.