Chapter 11 Risk Retention Conditions

Chapter Preview. When using the method of constrained optimization to propose an effective risk retention policy, it can be valuable to interpret the conditions necessary for transferring a risk. This chapter introduces the Karush-Kuhn-Tucker, or KKT, conditions that must be satisfied for an optimal risk transfer solution. To underscore their relevance, we initially revisit the framework of Chapter 3 for a single risk and demonstrate how the KKT framework can summarize the detailed work developed in that context.

Moving to the multivariate case, these conditions can verify classical conditions on balance among risks for multivariate excess of loss and extend them to encompass cases of dependence. This framework is subsequently employed to establish conditions to achieve (1) a binding budget, (2) a balance among retention parameters at the optimum, and (3) boundary constraints. All these conditions rely on the risk measure relative marginal change (\(RM^2\)). Additionally, because of its significance in multivariate risk retention, special consideration is given to risk retention within a simulation context. Specifically, this framework is utilized to establish the equivalence between quantile regression and \(ES\) optimization.

11.1 KKT Conditions

We now return to the general constrained optimization problem in Display (3.9) and the associated Lagrangian in equation (3.10). Suppose that \({\bf z}^*\) is a local minimizer on the feasible set. Then, there exist multipliers \(\mathbf{LMI}^* = \left(LMI_j^*; ~j \in CON_{in} \right)^{\prime}\) and \(\mathbf{LME}^* = \left(LME_j^*; ~j \in CON_{eq} \right)^{\prime}\) that satisfy the

Karush-Kuhn-Tucker (\(KKT\)) Conditions. \[\begin{equation} \begin{array}{ll} \partial_{z_i} ~ \left. LA\left({\bf z},\mathbf{LMI}^*,\mathbf{LME}^* \right) \right|_{{\bf z}={\bf z}^*} = 0 & i=1, \ldots, p_z \\ LMI_j^* f_{con,j}({\bf z}^*) = 0 & j \in CON_{in} \\ LMI_j^* \ge 0 & j \in CON_{in} \\ f_{con,j}({\bf z}^*) \le 0 & j \in CON_{in} \\ f_{con,j}({\bf z}^*) = 0 & j \in CON_{eq} . \\ \end{array} \tag{11.1} \end{equation}\]

The conditions in Display (11.1) are known as the Kuhn-Tucker or the Karush-Kuhn-Tucker (KKT) conditions, named after their originators. Mathematically, they are necessary conditions; that is, if a point is an optimum, then the conditions must hold. Many constrained optimization algorithms can be seen as methods for solving these conditions; they are extensively used in analyses.

The KKT conditions are also referred to as first-order necessary conditions as they are based on first derivatives. Being derivative-based, these conditions only ensure a point is a local optimum and do not indicate global behavior. Additional constraints, such as convexity or conditions on second derivatives, are necessary to guarantee a point is a global minimizer. It is important to note that these are (partial) derivatives with respect to the decision variables. This differs from the sensitivity analysis in Chapter 9, which focused on derivatives with respect to auxiliary parameters.

Example 11.1.1. Univariate Risk Retention Conditions. Consider a single (univariate) risk \(X\) and the risk retention function given in Section 2.2 that has parameters \(\boldsymbol \theta = (d,c,u)\). For this example, let us take the coinsurance parameter \(c=1\). For the risk measure, consider the expected shortfall \(ES_{\alpha}[g(X);\boldsymbol \theta]\) given in equation (2.11). For simplicity, use the fair risk transfer cost \(RTC(\boldsymbol \theta)\) \(= \mathrm{E}(X) - \mathrm{E}[g(X;\boldsymbol \theta)]\), with an expression for \(\mathrm{E}[g(X;\boldsymbol \theta)]\) in equation (2.7). Explicitly, consider the risk retention problem \[ \boxed{ \begin{array}{cll} {\small \text{minimize}}_{d,u} & f_0(d,u) =ES_{\alpha}[g(X);d,u] \\ {\small \text{subject to}} & f_{con,1}(d,u) = RTC(d,u) - RTC_{max} \le 0 & \\ & d \ge 0, ~~~~~~~~ u \ge d .\\ \end{array} } \] As in Section 3.5, we assume that the \(RTC_{max}\) is sufficiently small so that the largest feasible deductible is smaller than the \(\alpha\) quantile of the risk \(X\), that is, \(d_{max} < F^{-1}_{\alpha}\). Under this constraint, one can use the KKT conditions to verify that the optimal deductible is \(d^*=0\).

\(Under~the~Hood.\) Verify that the Optimal \(d=0\)

11.2 Risk Retention Conditions for a Single Risk

I now extend Example 11.1.1 to incorporate coinsurance along with other risk measures. This section revisits the work that was done in Section 3.5, but presenting it within the KKT framework offers a more concise and elegant approach. The focus remains on a single (univariate) risk \(X\) and the risk retention function detailed in Section 2.2 that has parameters \(\boldsymbol \theta = (d,c,u)\). Initially, let us consider a generic risk measure \(RM\) and risk transfer cost \(RTC\). It is worth noting that when \(d=u\), no losses are retained. To introduce some risk retention, assume that the maximal risk transfer cost \(RTC_{max} < RTC(d,c, u=d)\).

Explicitly, consider the risk retention problem \[ \boxed{ \begin{array}{cll} {\small \text{minimize}} & f_0(\boldsymbol \theta) =RM[g(X);\boldsymbol \theta] \\ {\small \text{subject to}} & f_{con,1}(\boldsymbol \theta) = RTC(\boldsymbol \theta) - RTC_{max} \le 0 & \\ & f_{con,2}(\boldsymbol \theta) = c -1 \le 0 \\ & d \ge 0, c \ge 0, u \ge d .\\ \end{array} } \] Here, the constraint on \(f_{con,2}\) ensures \(c \le 1\).

Consistent with Table 2.2, I also assume: \[\begin{equation} \boxed{ {\small \begin{array}{llll} \partial_d RM[g(X);\boldsymbol \theta] \le0 & & \partial_d RTC(\boldsymbol \theta) > 0 \\ \partial_c RM[g(X);\boldsymbol \theta] > 0 & & \partial_c RTC(\boldsymbol \theta) < 0 \\ \partial_u RM[g(X);\boldsymbol \theta] \ge 0 & & \partial_u RTC(\boldsymbol \theta) < 0 . \\ \end{array} } } \tag{11.2} \end{equation}\]

For each parameter \(\theta\) in the vector \(\boldsymbol \theta\), define the risk measure relative marginal change \[ RM^2_{\theta} = - \frac{\partial_{\theta} RM[g(X);\boldsymbol \theta]}{\partial_{\theta} RTC(\boldsymbol \theta)}, \ \ \ \text{for } \theta = d, c, u, \] and, using Display (11.2), note that \(RM^2_{\theta} \ge 0\) for each \(\theta\).

At the optimum values of \(\boldsymbol \theta^* = (d^*,c^*,u^*)\), one can use the KKT conditions to provide sufficient conditions so that the optimal values of parameters are on boundaries, as follows \[\begin{equation} \begin{array}{ccc} &&\text{if } \ \ \ RM^2_{u^*} > RM^2_{d^*}, \ \ \ \text{then } d^*=0 \\ \end{array} \tag{11.3} \end{equation}\] and \[\begin{equation} \begin{array}{ccc} && \text{if } \ \ \ RM^2_{u^*} > RM^2_{c^*}, \ \ \ \text{then } c^*=1. \\ \end{array} \tag{11.4} \end{equation}\]

\(Under~the~Hood.\) Proof of the Single Risk Retention Results

To show how to apply these results, we explore the \(VaR\) and \(ES\) measures. The following Table 11.1 summarizes the \(RM^2\) measures, providing an extension of Table 2.2.

Table 11.1. \(RM^2\) for Value at Risk and Expected Shortfall \[\begin{equation} {\small \begin{matrix} \begin{array}{l | ccc | c} \hline \text{Summary} & & \text{Parameter} (\theta) \\ \text{Measure} & d & c & u & \text{Range of } \alpha\\ \hline \begin{array}{c} RM^2_{\theta} \text{ for} \\ VaR\end{array} & \begin{array}{cc} 0 \\ \frac{1}{1-F(d)} \\ \frac{1}{1-F(d)} \\ \end{array} & \begin{array}{cc} 0 \\ \frac{F^{-1}_{\alpha}-d}{\mathrm{E} (X \wedge u)- \mathrm{E} (X \wedge d) } \\ \frac{u-d}{\mathrm{E} (X \wedge u)- \mathrm{E} (X \wedge d) } \end{array} & \begin{array}{cc} 0 \\ 0 \\ \frac{1}{1-F(u)} \\ \end{array} &\begin{array}{c} \alpha < F(d) \\ F(d) \le \alpha < F(u) \\ F(u) \le \alpha \\ \end{array} \\ \hline \begin{array}{c} RM^2_{\theta} \text{ for} \\ ES\end{array} & \begin{array}{c} \frac{1}{1-\alpha} \\ \frac{1}{1-F(d)} \\ \\ \\ \frac{1}{1-F(d)} \\ \end{array} & \begin{array}{c} \frac{1}{1-\alpha} \\ \frac{1}{1-\alpha} \left\{ \mathrm{E} (X \wedge u)- \mathrm{E} (X \wedge F_{\alpha}^{-1}) \right. \\ \left. +(1-\alpha)(F_{\alpha}^{-1} - d) \right\} / \\ \left\{ \mathrm{E} (X \wedge u)- \mathrm{E} (X \wedge d) \right\} \\ \frac{u-d}{\mathrm{E} (X \wedge u)- \mathrm{E} (X \wedge d) } \\ \end{array} & \begin{array}{c} \frac{1}{1-\alpha} \\ \frac{1}{1-\alpha} \\ \\ \\ \frac{1}{1-F(u)} \\ \end{array} & \begin{array}{c} \alpha < F(d) \\ F(d) \le \alpha < F(u) \\ \\ \\ F(u) \le \alpha \\ \end{array} \\ \hline \end{array} \end{matrix} } \end{equation}\]

For both the \(VaR\) and \(ES\) risk measures, from Table 11.1 we see that if the optimal \(u^*<F^{-1}_{\alpha}\), then the condition in Display (11.3) holds and the optimal deductible is \(d^*=0\).

For the expected shortfall measure, suppose that the optimal deductible \(d^*<F^{-1}_{\alpha}\). Then, from Table 11.1 we see that the condition in Display (11.3) holds and the optimal deductible is \(d^*=0\). As in Section 3.5, we can assume that the \(RTC_{max}\) is sufficiently small so that \(d_{max} < F^{-1}_{\alpha}\), thus ensuring that \(d^*<F^{-1}_{\alpha}\).

For the coinsurance parameter, let us again consider the case where the optimal \(u^*<F^{-1}_{\alpha}\). Then, for both the value at risk and the expected shortfall, the condition in Display (11.4) can be expressed as \[ \begin{array}{ll} &\frac{1}{1-F(u^*)}=RM^2_{u^*} > RM^2_{c^*} = \frac{u^*-d^*}{\mathrm{E} (X \wedge u^*)- \mathrm{E} (X \wedge d^*) } \\ \iff &\mathrm{E} (X \wedge u^*)- \mathrm{E} (X \wedge d^*) > (u^*-d^*)[1-F(u^*)] . \end{array} \] Using \[ \begin{array}{ll} \mathrm{E} (X \wedge u^*)- \mathrm{E} (X \wedge d^*) &= \int^{u^*}_{d^*} [1-F(z)]dz \\ &> \int^{u^*}_{d^*}[1-F(u)]dz = (u^*-d^*)[1-F(u^*)] ,\\ \end{array} \] we see that this condition is satisfied at the optimum so the optimal parameter is \(c^*=1\).

Readers are invited to explore extensions to the range value at risk, \(RVaR\).

11.3 Risk Retention Conditions for Multiple Risks

We now use the KKT conditions introduced in Section 11.1 to investigate general conditions required for achieving an optimal risk portfolio.

11.3.1 Risk Retention Conditions

We start by looking at a risk retention problem in a fairly abstract way and then specialize to problems of interest to us. To be explicit, consider a slight modification of risk retention problem in Display (9.1) as follows. For simplicity, I dropped the auxiliary decision variable \(z_0\) used in expected shortfall and use \(RM\) for a risk measure instead of the generic \(f_0\).

Risk Retention Problem \[\begin{equation} \begin{array}{lc} {\small \text{minimize}_{\boldsymbol \theta}} & RM( \boldsymbol \theta) \\ {\small \text{subject to}} & ~~~~~RTC(\boldsymbol \theta) \le RTC_{max} \\ & {\bf P} \boldsymbol \theta \le {\bf p}_0 ~~. \end{array} \tag{11.6} \end{equation}\]


To permit detailed analysis, define \({\bf P}_i\) to be the \(i\)th row of \(\bf P\) and \(p_{0,i}\) to be the \(i\)th element of \({\bf p}_0\). With this notation, the \((i+1)\)st constraint is \({\bf P}_i ~ \boldsymbol\theta \le p_{0,i}\). Thus, we can write the Lagrangian is \[\begin{equation} \begin{array}{lc} LA( \boldsymbol \theta, {\bf LMI}) &= RM[g(\mathbf{X};\boldsymbol \theta)] + LMI_1[RTC(\boldsymbol \theta) - RTC_{max} ] \\ & ~~~~~~ +\sum_{i=1}^{m-1} LMI_{i+1} \left({\bf P}_i ~ \boldsymbol\theta - p_{0,i}\right) . \end{array} \tag{11.7} \end{equation}\] Using Display (11.1), the risk retention problem KKT conditions are

\(KKT\) Conditions for the Risk Retention Problem \[\begin{equation} \begin{array}{ll} \partial_{\theta_j} ~ \left. LA\left(\boldsymbol \theta,\mathbf{LMI}^* \right) \right|_{\boldsymbol \theta=\boldsymbol \theta^*} = 0 & j=1, \ldots, p_z \\ LMI_1^* [RTC(\boldsymbol \theta^*) - RTC_{max} ] = 0 \\ LMI_{i+1}^* \left[{\bf P}_i ~ \boldsymbol\theta - p_{0,i}\right] = 0 & i=1, \ldots, m-1\\ LMI_i^* \ge 0 & i=1, \ldots, m \\ RTC(\boldsymbol \theta^*) \le RTC_{max} \\ {\bf P}_i ~ \boldsymbol\theta^* \le p_{0,i} & i=1, \ldots, m-1 .\\ \end{array} \tag{11.8} \end{equation}\]


Our interest is in summarizing behavior when the optimal value of the risk retention parameters are not on one of the edges defined by the vector constraint \({\bf P} \boldsymbol \theta \le {\bf p}_0\). So, for the \(j\)th decision variable, let us consider a typical row, say, the \(i\)th one, where \(\theta_j\) is present. Mathematically, we can express this as \(\partial_{\theta_j} {\bf P}_i \boldsymbol \theta \ne 0\) for some \(\boldsymbol \theta\). For such a row, to ensure that the \(j\)th decision variable is not on an edge, we require \({\bf P}_i ~ \boldsymbol\theta^* < p_{0,i}\). To summarize, we require:

Edge Condition for the \(j\)th Decision Variable. For \(i = 1, \ldots, m-1\), if \(\partial_{\theta_j} {\bf P}_i \boldsymbol \theta \ne 0\) for some \(\boldsymbol \theta\), then \({\bf P}_i ~ \boldsymbol\theta^* < p_{0,i}\).

As before, it is helpful to use the risk measure relative marginal change, \[ RM^2_j(\boldsymbol \theta) = - \frac{\partial_{\theta_j}RM[g(\mathbf{X};\boldsymbol \theta)]}{\partial_{\theta_j}RTC(\boldsymbol \theta)} . \] Problems of interest to us largely adhere to the following:

  • Condition RR1. \(\partial_{\theta_j} ~ \left. RTC\left(\boldsymbol \theta \right) \right|_{\boldsymbol \theta=\boldsymbol \theta^*} \ne 0\).
  • Condition RR2. \(\partial_{\theta_j} ~ \left. RM[g(\mathbf{X};\boldsymbol \theta)] \right|_{\boldsymbol \theta=\boldsymbol \theta^*} \ne 0\).

When the edge condition for the \(j\)th decision variables holds, there are opportunities to consider small changes in the parameter. When this happens, we want these small changes to affect both the risk transfer cost (Condition RR1) and the risk measure (Condition RR2). It is worth noting that the requirement \(0<RM^2_j(\boldsymbol \theta^*) < \infty\) is sufficient for Conditions RR1 and RR2. We typically use this simpler condition in applications.

Binding Budget Constraint

Assume that the edge condition for the \(j\)th decision variable and the corresponding Conditions RR1 and RR2 hold. Then, \(RTC(\boldsymbol \theta^*) = RTC_{max}\).

\(Under~the~Hood.\) Confirm the Budget Constraint to be Binding

If the budget constraint is binding, this reduces the feasible region where one searches for the optimal parameter values. If we have good reason to suspect that an algorithm will converge where one of the parameters has a positive \(RM^2\) (so that both Conditions RR1 and RR2 hold), then we might get faster convergence by assuming the risk transfer constraint is binding.

Moreover, from the proof, one can observe that the optimal value of the Lagrange multiplier is \(RM^2_j(\boldsymbol \theta^*)\). It is interesting to note that this holds for all retention parameters where the edge condition on the decision variables hold. This leads to the following.

Balance Among Retention Parameters at the Optimum

Assume that the edge conditions for the \(i\)th and \(j\)th decision variables and the corresponding Conditions RR1 and RR2 hold. Then, \[\begin{equation} RM^2_i(\boldsymbol \theta^*) = RM^2_j(\boldsymbol \theta^*). \tag{11.9} \end{equation}\]


This result was previously hinted at in Section 2.3 where, in the simpler context of a single risk with only a few parameters, we described the natural interpretation of the risk measure relative marginal change. Recall that \(RM^2\) can be interpreted as measuring the marginal (negative) change in the risk measure per unit marginal change in the risk transfer cost. In other words, if a change in the \(j\)th parameter causes a unit change in the risk transfer cost, the \(RM_j^2\) quantifies the amount of change in the measure of uncertainty. It seems reasonable that, at the optimum, we would strive for a balance among the parameter settings. Without this balance, a change in one parameter would result in a greater (or lesser) change in the objective function, given the same change in the risk transfer cost.

Example 11.3.1. Multivariate Excess of Loss with Variance as a Risk Measure. For the variance as a risk measure, we have \[ \begin{array}{ll} \partial_{\theta_j} RM[g(\mathbf{X};\boldsymbol \theta)] & = \partial_{\theta_j} \frac{1}{2} \mathrm{Var}[g(\mathbf{X};\boldsymbol \theta)] \\ & = \frac{1}{2} \partial_{\theta_j} \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)]^2 - \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)] \{\partial_{\theta_j} \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)]\}\\ & = \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)\partial_{\theta_j}g({\bf x};\boldsymbol \theta)] - \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)] \{ \mathrm{E}[\partial_{\theta_j}g({\bf x};\boldsymbol \theta)]\} .\\ \end{array} \] For excess of loss, we require that upper limit parameters \(u\) be nonnegative. So, the constraint \(u_j \ge 0\) is equivalent to \({\bf P}_j ~ \boldsymbol\theta \le p_{0,j}\) by taking \({\bf u} = \boldsymbol\theta\), \({\bf P}_j = -{\bf 1}_j'\), and \(p_{0,j}=0\).

For the multivariate excess of loss, we take \(g({\bf X};{\bf u}) = X_{1} \wedge u_1 + \cdots + X_{p} \wedge u_p = S({\bf u})\). With this, we have \(\partial_{u_j}g({\bf x};{\bf u}) = I(X_{j} > u_j)\). Thus, \[\begin{equation} \begin{array}{ll} RM_j^2({\bf u}) &= -\frac{\partial_{u_j} RM[S({\bf u})]}{\partial_{u_j} RTC({\bf u})} \\ &= \mathrm{E}[S({\bf u}) | X_{j} > u_j] - \mathrm{E}[S({\bf u})] \\ & = u_j - \mathrm{E}[X_j \wedge u_j ] + \sum_{i \ne j}^p \left\{ \mathrm{E}[X_i \wedge u_i | X_{j} > u_j] - \mathrm{E}[X_i \wedge u_i ]\right\} . \\ \end{array} \tag{11.10} \end{equation}\] In the case of independence, this reduces to \(RM_j^2=u_j - \mathrm{E}[X_j \wedge u_j ]\), the result for the classical problem that we developed in Section 4.1.2.

\(Under~the~Hood.\) Confirm the \(RM^2\) for Example 11.3.1

At the optimum, if for some risk \(u^*_j >0\) and \(0<RM_j^2({\bf u}^*) < \infty\), then the budget constraint is binding. In addition, if for another risk \(u^*_i >0\) and the risk measure relative marginal change is positive and finite, then we have a balance \(RM_i^2({\bf u}^*)=RM_j^2({\bf u}^*)\).


Example 11.3.2. Bivariate Excess of Loss with \(VaR\) as a Risk Measure. For excess of loss with two risks, the retained losses are the limited sum \(g(\mathbf{X};\boldsymbol \theta) = S(u_1,u_2)\) \(=X_1 \wedge u_1 + X_2 \wedge u_2\) where \(\boldsymbol \theta = (u_1,u_2)^{\prime}\). For simplicity, we assume a fair risk transfer cost.

\(Under~the~Hood.\) Check the \(VaR\) Balance

Using equation (11.9), the balance among retention parameters can be expressed as \[ \begin{array}{ll} I(u_1^*<z_0^*)f_2(z_0^*-u_1^*)\frac{1-C_1[F_2(z_0^*-u_1^*),F_1(u_1^*)]}{1-F_1(u_1^*)}\\ ~~~~~~~~~~~~~= I(u_2^*<z_0^*)f_1(z_0^*-u_2^*)\frac{1-C_1[F_1(z_0^*-u_2^*),F_2(u_2^*)]}{1-F_2(u_2^*)} . \end{array} \] at the optimal value of \(z_0^*=F_{S(u_1^*,u_2^*)}^{-1}(\alpha)\). Here, \(C_1\) is a copula derivative defined in Appendix Section 14.1. From this expression, we see that the balance depends on the copula and hence on the dependence between risks. Further, both optimal retention levels \(u_1^*\) and \(u_2^*\) must be less than the optimal value \(F_{S(u_1^*,u_2^*)}^{-1}(\alpha)\). Yet we need \(u_1^*+u_2^* > F_{S(u_1^*,u_2^*)}^{-1}(\alpha)\). In addition:

  • Special Case of Independence. To get further insights, it is also of interest to consider the special case where the risks are independent. Here, \(C_1(u,v)=v\) and we have \[ \begin{array}{ll} I(u_1^*<z_0^*)f_2(z_0^*-u_1^*) = I(u_2^*<z_0^*)f_1(z_0^*-u_2^*). \end{array} \]

  • Special Case of Identical Distributions. As another interesting case, if the distributions are equal, then we have \[ \begin{array}{ll} I(u_1^*<z_0^*)f(z_0^*-u_1^*)\frac{1-C_1[F(z_0^*-u_1^*),F(u_1^*)]}{1-F(u_1^*)} \\ ~~~~~~~~~~~~~= I(u_2^*<z_0^*)f(z_0^*-u_2^*)\frac{1-C_1[F(z_0^*-u_2^*),F(u_2^*)]}{1-F(u_2^*)} . \end{array} \] This naturally holds true if \(u_1^*=u_2^*\) regardless of the dependence.


Example 11.3.3. Separable Contracts and Risk Measures. Now consider the case where the portfolio risk transfer costs can be subdivided into separate contracts; see equation (7.10) of Section 7.5.1 where we wrote \(g(\mathbf{X}; \boldsymbol \theta) = \sum_{j=1}^p g_j(X_j; \boldsymbol \theta_j)\). From this, let us assume that one can write the risk transfer cost as \(RTC(\boldsymbol \theta) = \sum_{j=1}^p RTC_j( \boldsymbol \theta_j)\). In the same way, we also assume that the risk measure can be subdivided additively as \(RM(\boldsymbol \theta) = \sum_{j=1}^p RM_j( \boldsymbol \theta_j)\). With these assumptions, we may write \(RM_j^2 = -\partial_{\theta_j}RM_j(\theta_j) / \partial_{\theta_j}RTC_j(\theta_j)\).

Suppose that the \(j\)th contract has an upper limit form (excess of loss). Then, from Table 11.1, \[ RM_j^2 = \left\{ \begin{array}{ll} \frac{1}{1-F_j(u_j)}I[F_j(u_j-) \le \alpha] & \text{for } VaR \\ \min\left(\frac{1}{1-\alpha},\frac{1}{1-F_j(u_j)} \right) & \text{for } ES . \\ \end{array} \right. \] Under the mild conditions required for equation (11.9), this suggests that \(F_j(u_j^*)\) is the same. That is, at the optimum, all positive upper limits have the same quantile.


The assumption of separable risk measures in Example 11.3.3 is unlikely to hold in general. However, it is true when using, for example, \(VaR\) and \(ES\), as risk measures and when risks are comonotonic (see Section 9.1 or, for a broader introduction, Denuit et al. (2006), Chapter 2). Comonotonicity represents extreme positive dependence and so is unlikely to hold in practice. However, it may be useful to make this assumption to quickly generate upper limit values and then use these as starting values for general situations.

11.3.2 Risk Retention Boundary Conditions

To simplify the presentation, this subsection only considers edge conditions of the form \(\theta_j \ge 0\). In this context, one might wonder under what conditions is one of the retention parameters equal to 0? To illustrate, suppose that it is known that \(\theta_1^*>0\) and we would like conditions that lead to \(\theta_2^* = 0\). To this end, from equation (11.7) \[ \begin{array}{ll} \partial_{\theta_2} LA(\boldsymbol \theta, {\bf LMI}) &= ~\partial_{\theta_2}RM(\boldsymbol \theta) +LMI_1 ~\partial_{\theta_2}RTC(\boldsymbol \theta) - LMI_3\\ &= \partial_{\theta_2}RTC(\boldsymbol \theta)\left(\frac{\partial_{\theta_2}RM(\boldsymbol \theta)}{\partial_{\theta_2}RTC(\boldsymbol \theta)} ~ +RM^2_1(\boldsymbol \theta^*) \right) - LMI_3\\ &= \partial_{\theta_2}RTC(\boldsymbol \theta)\left(RM_1^2-RM_2^2 \right) - LMI_3.\\ \end{array} \] So, at the optimum, if \(\partial_{\theta_2}RTC(\boldsymbol \theta)\left(RM_1^2-RM_2^2\right)\) is positive, by the first KKT condition, this means that \(LMI_3^*\) is positive. By the third condition, this means that \(\theta_2^* = 0\). We summarize this as follows.

Risk Retention Boundary Conditions

Assume \(\theta_i^* > 0\) for the \(i\)th decision variable and the corresponding Conditions RR1 and RR2 hold. In addition, suppose that either

  • \(\partial_{\theta_j}RTC(\boldsymbol \theta) >0\) and \(RM_i^2 > RM_j^2\) or
  • \(\partial_{\theta_j}RTC(\boldsymbol \theta) <0\) and \(RM_i^2 < RM_j^2\),

at \(\boldsymbol \theta=\boldsymbol \theta^*\). Then \(\theta_j^* = 0\).


In summary, from equation (11.9), we have a collection of \(RM^2\) ratios that are the same. If a ratio is not equal to the collective number, then there are mild conditions so that the corresponding retention parameter is zero.

In many of our problems, we focus on cases where an increase in a parameter \(\theta\) means that more risk is retained, such as with upper limits and coinsurances. More risk retained means that as \(\theta\) increases, we expect risk transfer costs to decrease and our (risk) measure of retained risk to increase. However, for other problems, such as deductibles, an increase in a parameter means that less risk is retained.

Example 11.3.4. Multivariate Deductible with Variance as a Risk Measure. As in Example 11.3.1, the partial derivative of the risk measure is \[ \begin{array}{ll} \partial_{\theta_j} RM[g(\mathbf{X};\boldsymbol \theta)] & = \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)\partial_{\theta_j} g_{j}(\mathbf{X};\boldsymbol \theta)] - \mathrm{E}[g(\mathbf{X};\boldsymbol \theta)] \{ \mathrm{E}[\partial_{\theta_j} g_{j}(\mathbf{X};\boldsymbol \theta)]\} .\\ \end{array} \] For the multivariate deductible, we take \(d_j = \theta_j\) and \[ \begin{array}{ll} g({\bf X};{\bf d}) &= (X_{1}-d_1)_+ + \cdots + (X_{p} - d_p)_+ \\ &= [X_1 - X_{1} \wedge d_1] + \cdots + [X_p - X_{p} \wedge d_p ] =S(\boldsymbol \infty) -S({\bf d}). \end{array} \] With this, we have \(\partial_{d_j}g({\bf x};{\bf d}) = -I(X_{j} > d_j)\). Further calculations show \[\begin{equation} \begin{array}{ll} RM_j^2({\bf d}) &= -\frac{\partial_{d_j} RM[S({\bf d})]}{\partial_{d_j} RTC({\bf d})} \\ & = d_j - \mathrm{E}[X_i \wedge d_j ] + \sum_{i \ne j}^p \left\{ \mathrm{E}[X_i \wedge d_i | X_{j} > d_j] - \mathrm{E}[X_i \wedge d_i ]\right\} . \\ \end{array} \tag{11.11} \end{equation}\] This is the same \(RM^2\) measure as in Example 11.3.1. The only difference between the two problems is the sign of the partial derivative of the risk transfer cost, \(\partial_{\theta_j} RTC(\boldsymbol \theta)\). So, when we take a look at the edge condition for the deductible problem, a small value of \(RM_j^2({\bf d}^*)\) at the optimum means that \(d_j = 0\), signifying full retention.

\(Under~the~Hood.\) Confirm the \(RM^2\) for Example 11.3.4

Suppose that all of the parameters adhere to the assumption that retained risk increases with a parameter increase. Additionally, if all of the parameters equal 0, that is, if \(\theta_1 = \cdots = \theta_p = 0\), then there is no risk retention, corresponding to full transfer (which could be full insurance). This is illustrated in Examples 11.4.1 - 11.4.4 of the subsequent Section 11.4.1. In this scenario, we can set the maximal risk transfer cost below the case of no risk retention, \(RTC_{max} < RTC(\mathbf{0})\), ensuring that we have some positive retention parameters. Under this assumption, at least one \(\theta_j^*>0\), satisfying one of the basic ingredients for risk retention conditions to hold.

Re-parameterization

The result of a parameter at a boundary has a natural interpretation, such as full transfer for upper limit and coinsurance parameters, and full retention for deductible parameters. In some cases, analysts can re-parameterize ( or redefine) parameters to achieve alternative desirable interpretations.

For instance, if the jth risk has upper limit \(u_j < \infty\), then we may re-parameterize the deductible as \(d_{1j} = (u_j - d_j)_+\), representing the amount that the deductible \(d_j\) falls below the upper limit. The new parameter \(d_{1j}\) has zero risk retention when \(d_{1j}=0\) and aligns with our objective of taking on more risk as the parameter increases. In such cases, we can interpret the boundary condition result \(\theta_j^* = 0\) to indicate no retention, or full transfer, of the jth risk.

The re-parameterization tactic can be applied to identify conditions for full risk retention in other scenarios. For example, in a multivariate excess of loss policy, if an upper limit \(u_j\) is associated with a risk, then \(u_j=0\) implies no risk retention, whereas \(u_j=\infty\) signifies full retention. Therefore, we might define a new parameter, the reciprocal of the upper limit \(u_j^R = 1/u_j\). We could then identify conditions so that the boundary conditions hold so that \(u_j^R = 0\), signifying full risk retention.

11.4 Risk Measure Relative Marginal Changes

As seen in Section 11.3, the risk measure relative marginal change, \(RM_j^2\), plays a key role in the investigation of risk transfer conditions. Because practical applications involves simulation of multivariate risks, this section develops \(RM_j^2\) metrics in this context.

Separate Contract

Because the focus of \(RM^2_j\) is on marginal changes, we can extend slightly the idea of separable contracts introduced in Section 7.5.1 to specify that only the jth contract is separate. Mathematically, define \(\boldsymbol \theta_{(j)}= (\theta_1, \ldots, \theta_{j-1}, \theta_{j+1}, \ldots, \theta_p)'\) to be the vector of parameters excluding the jth one and similarly for \({\bf X}_{(j)}\). Then, we could write the retention function for the jth risk as \[ \tilde{g}_j(X_j; \theta_j) = g({\bf X};\boldsymbol \theta) - g_{(j)}({\bf X}_{(j)};\boldsymbol \theta_{(j)}) , \] where \(g_{(j)}(\cdot)\) is the retention function for all risks except the jth one. Unlike in Section 7.5.1, we do not require that \(g_{(j)}(\cdot)\) be additive.

If the jth risk is separate, then determination of partial derivatives of risk retention become easier as they rely on only one parameter. Table 11.2 summarizes this calculation for different types of retentions.

Table 11.2. Separate Risk Retention Functions and Partial Derivatives

\[ {\small \begin{array}{l|l|l} \hline \text{Retention Type} & \text{Retention Function} & \text{Partial Derivative} \\ & ~~~~~~~~~ g({\bf X}; \theta) & ~~~~~~~~~ \partial_{\theta} ~g({\bf X}; \theta) \\ \hline \text{Deductible} & g(x; d) ~~~~= (x-d)_+ & \partial_d ~g(x; d) ~~~~= - I(x>d)\\ \text{Coinsurance} & g(x; c) ~~~~= c ~x & \partial_c ~g(x; c) ~~~~=x\\ \text{Upper Limit} & g(x; u) ~~~~= x \wedge u & \partial_u ~g(x; u)~~~~= I(x>u)\\ \text{Upper Limit with } & g(x; u_R)~= x \wedge \frac{1}{u_R} & \partial_{u_R} ~g(x; u_R)= \frac{-1}{u_R^2} I(x>\frac{1}{u_R})\\ ~~~~\text{Reciprocal Parameter} \\ \hline \end{array} } \]

11.4.1 Variance Risk Retention

Using the variance as a risk measure, we are able to utilize the usual empirical simulation estimators of the distribution because the variance is differentiable in the risk retention parameters. Readers will not be surprised that we can get some intuitively pleasing results using the classic assumption of the variance as a risk measure. To recap, here is a summary of the simulation version of this risk retention problem. \[\begin{equation} \boxed{ \begin{array}{lc} {\small \text{minimize}_{\boldsymbol \theta}} & RM_{var}(\boldsymbol \theta) =\frac{1}{2R} \sum_{r=1}^R \left\{g({\bf X}_{r};\boldsymbol \theta)^2 -\overline{g({\bf X}_{R};\boldsymbol \theta)}\right\}^2\\ {\small \text{subject to}} & ~~~~~RTC_R(\boldsymbol \theta) \le RTC_{max} . \end{array} } \tag{11.12} \end{equation}\] For notation, let \(g_{rj} =\partial_{\theta_j}g({\bf X}_{r};\boldsymbol \theta)\) and \(\bar{g}_{j}=\frac{1}{R} \sum_{r=1}^R g_{rj}\). Also, let the average risk retained be denoted as \(\overline{g({\bf X}_{R};\boldsymbol \theta)}\). With this, we can express the risk measure relative marginal statistic as \[\begin{equation} \begin{array}{ll} RM_{var,j}^2(\boldsymbol \theta) &= -\frac{\partial_{\theta_j} ~RM_{var}(\boldsymbol \theta) }{\partial_{\theta_j}RTC_R(\boldsymbol \theta)} = \overline{g({\bf X}_{R};\boldsymbol \theta)} - {\large \frac{ \mathrm{E}_R[ g({\bf X}_{R};\boldsymbol \theta) g_{Rj}] }{\bar{g}_{j}} } .\\ \end{array} \tag{11.13} \end{equation}\] Here, the second term on the right-hand side of equation (11.13) is a weighted average of retained risks where the weights are given by the partial derivatives \(g_{rj}\). It is worth noting that if we are comparing \(RM_i^2\) to \(RM_j^2\) to determine the balance in the system, only the second term is important as the first term, the average retained risk, to the same for the \(i\)th and \(j\)th measures.

\(Under~the~Hood.\) Confirm the \(RM^2\) for the Simulation Variance Risk Measure

Interpretations of this result are best seen in the context of some special cases.

Example 11.4.1. Coinsurance. For the \(j\)th risk, take \(c_j = \theta_j\). Thus, from Table 11.2 we have \(g_{rj} = X_{rj}\). With equation (11.13), in this case we can express the risk measure relative marginal change as \[ \begin{array}{ll} RM_{var,j}^2 = \overline{g({\bf X}_{R};\boldsymbol \theta)} - {\large \frac{ \mathrm{E}_R[ g({\bf X}_{R};\boldsymbol \theta) X_{rj}] }{\overline{X}_{Rj}} } .\\ \end{array} \] Thus, the \(j\)th risk measure relative marginal change is the average retained risk minus a weighted average retained risk where the weights are given by the size of the \(j\)th risk.


Example 11.4.2. Excess of Loss. Now let \(u_j = \theta_j\). Using Table 11.2, we have \(g_{rj} = I(X_{rj} > u_j)\). Then, equation (11.13) becomes \[ \begin{array}{ll} RM_{var,j}^2 & =\overline{g({\bf X}_{R};\boldsymbol \theta)} - {\large \frac{ \mathrm{E}_R[ g({\bf X}_{R};\boldsymbol \theta) g_{Rj}] }{\bar{g}_{j}} } \\ &= \overline{g({\bf X}_{R};\boldsymbol \theta)} - \frac{ \sum_{r=1}^R g({\bf X}_{r};\boldsymbol \theta) I(X_{rj} > u_j) ] }{ \sum_{r=1}^R I(X_{rj} > u_j)} .\\ \end{array} \] The \(j\)th risk measure relative marginal change can be expressed as the overall average minus average portfolio risk of those losses that exceeds the upper limit.

11.4.2 Quantile-Based Risk Retention

Using the KKT conditions yields desirable properties that, when applied to risk retention problems, can be described in terms of the risk measure relative marginal change, \(RM^2\). However, both the KKT conditions and \(RM^2\) metric are based on marginal derivatives that rely on smoothness in terms of risk retention parameters. However, basic simulation techniques assign a weight of \(1/R\) to each simulated outcome, and when used with typical check or indicator functions, the result is that simulated approximations are no longer smooth.

Unlike the variance, the usual empirical simulation approximations of quantile-based risk measures such as the \(VaR\) and the \(ES\) are not differentiable in the risk retention parameters. Therefore, technically, we need to use smooth estimators of the distribution introduced in Section 7.3, with properties developed in Section 10.5.3. Recall the density, distribution function, and general expectations from those sections as given in Table 11.3.

Table 11.3. Kernel Smoothed Expressions

\[ {\small \begin{array}{l|c|l} \hline\hline \textbf{Term} & \textbf{Symbol} & ~~~~~~~~\textbf{Expression} \\ \hline \text{Density} & f_{Rk}(y) &= \frac{1}{R~b} \sum_{r=1}^R k\left(\frac{y - g({\bf X}_r; \boldsymbol \theta)}{b}\right) \\\hline \text{Distribution function} & F_{Rk}(y) &= \frac{1}{R} \sum_{r=1}^R K\left(\frac{y - g({\bf X}_r; \boldsymbol \theta)}{b}\right) \\ &&~~~~= \mathrm{E}_R \left\{ K\left(\frac{y - g({\bf X}_R; \boldsymbol \theta)}{b}\right)\right\} \\\hline \text{General expectation} &\mathrm{E}_{Rk}\{h[g({\bf X}_R; \boldsymbol \theta)]\} & ={\LARGE \int} \mathrm{E}_{R} \left\{ h[g({\bf X}; \boldsymbol \theta) + bz] \right\} ~k(z) dz \\ ~~~\text{of retained risk}&\\ \hline \text{Partial derivative of the}& F_{Rk,\theta_j}(y) &= \frac{1}{R} \sum_{r=1}^R \partial_{\theta_j}K\left(\frac{y - g({\bf X}_r; \boldsymbol \theta)}{b}\right) \\ ~~~\text{distribution function}& \\ ~~~\textit{wrt} \text{ a retention parameter} &&~~~~= \frac{-1}{b} \mathrm{E}_{R} \left\{ k\left(\frac{y - g({\bf X}_R; \boldsymbol \theta)}{b}\right) g_{Rj} \right\} \\\hline\hline \end{array} } \] From these expressions, one can determine the quantile, or value at risk, in the usual way and is denoted as \(VaR_{Rk}\). In addition, the above display provides a partial derivative of the distribution function with respect to a retention parameter which is needed in the following.

For the \(VaR\), recall that in Section 5.3.1 we developed an expression for the quantile sensitivity but that development assumed smoothness in the argument of the distribution function as well as the risk retention parameters. This result in equation (5.5), with a smoothed empirical estimator of the distribution, can be expressed as \[ \begin{array}{ll} \partial_{\theta_j} VaR_{Rk}(\boldsymbol \theta) &= \frac{-1}{f_{Rk}[VaR_{Rk}]} F_{Rk,\theta_j}[VaR_{Rk}] . \end{array} \] We also need partial derivatives with respect to the risk transfer cost. From the general expectation for retained risks, one can see that \(\partial_{\theta_j}~RTC_R(\boldsymbol \theta) = - \bar{g}_j\).

\(Under~the~Hood.\) Confirm the Partial Derivative of \(RTC\)

Thus, risk measure relative marginal change for the value at risk is \[ \begin{array}{ll} RM_{VaR,j}^2(\boldsymbol \theta) &= -\frac{\partial_{\theta_j} ~VaR_R(\boldsymbol \theta) }{\partial_{\theta_j}RTC_R(\boldsymbol \theta)} \\ &= \frac{1}{\bar{g}_j ~f_{Rk}[VaR_{Rk}]} F_{Rk,\theta_j}[VaR_{Rk}] . \end{array} \]

For the expected shortfall, we use the \(ES\) sensitivity in equation (5.7). With this, we have \[ \begin{array}{ll} \partial_{\theta_j} ~ES_{Rk}[g(\mathbf{X};\theta)] \\ ~~~~=\frac{1}{1-\alpha} \left\{\partial_{\theta_j}\mathrm{E}_{Rk}[g(X;\boldsymbol \theta)] -\left.\partial_{\theta_j}\mathrm{E}_{Rk}[g(X;\boldsymbol \theta) \wedge z_0]\right|_{z_0=VaR_{Rk}}\right\} \\ ~~~~~~~~~~~~~ + [1 - \frac{1}{1-\alpha} \{1-F_{Rk}(VaR_{Rk})\}] \times \partial_{\theta_j} VaR_R(\boldsymbol \theta) . \end{array} \] Because the distribution function is smooth, in many cases there is no discreteness at the quantile, so \(F_{Rk}(VaR_{Rk})=\alpha\), and the second term on the right-hand side is zero. Nonetheless, sometimes discreteness is induced by the risk retention parameters, as seen in Example 2.5.1. With no discreteness, the risk measure relative marginal change for the expected shortfall can be written as \[ \begin{array}{ll} RM_{ES,j}^2(\boldsymbol \theta) &=\frac{1}{(1-\alpha)} {\Large \frac{\mathrm{E}_R \left\{\tilde{k}_{R\theta} ~g_{Rj}\right\} }{\bar{g}_j} }~~,\\ \end{array} \tag{11.14} \] where \(\tilde{k}_{R\theta}= 1-K\left([VaR_{Rk}-g({\bf X}_R; \boldsymbol \theta)]/b\right)\). For the \(r\)th simulated value, \(\tilde{k}_{r\theta}\) represents the probability that the reference (such as the normal) distribution exceeds the argument \([VaR_{Rk}-g({\bf X}_r; \boldsymbol \theta)]/b\). Thus, large values of \(g({\bf X}_r; \boldsymbol \theta)\) mean that \(\tilde{k}_{r\theta}\) is large, so it can be interpreted as a measure of the size of the retained risk.

\(Under~the~Hood.\) Confirm the Expected Shortfall \(RM^2\)

Example 11.4.3. Coinsurance. As in Example 11.4.1, take \(c_j = \theta_j\) and \(g_{rj} = X_{rj}\) for the \(j\)th risk. In this case, equation (11.14) becomes \[ \begin{array}{ll} RM_{ES,j}^2(\boldsymbol \theta) &=\frac{1}{(1-\alpha)} \frac{1}{\bar{g}_j} \mathrm{E}_R \left\{\tilde{k}_{R\theta} ~g_{Rj}\right\} ~~,\\ &=\frac{1}{(1-\alpha)} {\large \frac{\sum_{r=1}^R \tilde{k}_{r\theta} ~X_{rj} }{\sum_{r=1}^R X_{rj}} }~~.\\ \end{array} \] This is proportional to the weighted average of the measure of size of retained risk, where the weight is given by the \(j\)th loss.


Example 11.4.4. Upper Limit. As in Example 11.4.2, take \(u_j = \theta_j\) and \(g_{rj} = I(X_{rj} > u_j)\) for the \(j\)th risk. Now, equation (11.14) becomes \[ \begin{array}{ll} RM_{ES,j}^2(\boldsymbol \theta) &=\frac{1}{(1-\alpha)} {\large \frac{\sum_{r=1}^R \tilde{k}_{r\theta} ~I(X_{rj} > u_j) }{\sum_{r=1}^R I(X_{rj} > u_j) } }~~.\\ \end{array} \] This is proportional to the average of the measure of size of retained risk, where the average is taken over large values of the \(j\)th risk, those exceeding the upper limit \(u_j\).


Example 11.4.5. Varying the Cyber Risk Premium. This example continues from Exercise 8.2.4, which analyzed the ANU Excess of Loss problem with a market loading \(Cyber~Load\) given in Table 11.4. In Exercise 8.2.4, readers were tasked with minimizing the expected shortfall and determining optimal upper limits from which risk measures were computed.

Table 11.4 displays selected optimal retention levels. As in Exercise 8.2.4, the results are pleasing. A lower value of \(Cyber~Load\) (at half the fair cost) suggests full insurance, indicated by the optimal upper limit \(u_3^*=0\). Conversely, a high value of \(Cyber~Load\) (at double the fair cost) extends the optimal upper limit\(u_3^*\) well beyond the 99th risk percentile, suggesting full retention. In this example, we supplement this analysis by computing the \(ES\) risk measure relative marginal change (\(RM^2\)).

To assess the simulation accuracy of the \(RM^2\), note that it can be expressed as the ratio of two averages. The theory underpinning the standard errors are sketched out in Exercise 11.3. For this example, the standard errors that appear in Table 11.4 (with \(R=100000\)) enable the assessment of accuracy.

Code for Example 11.4.5
Table 11.4: ANU Cyber Risk Retention
Cyber Load \(VaR\) \(u_3\): Cyber \(RM_3^2\) \(se(RM_3^2)\) \(u_2\) \(u_4\)
0.5 6093 0 0.395 0.005 175 288
1.0 6226 432 0.327 0.005 186 252
1.5 6357 1506 0.295 0.008 198 144
2.0 6361 5633 0.338 0.055 204 145
2.5 6364 5848 0.386 0.014 204 147
3.0 6364 5848 0.321 0.011 204 147

In addition, Table 11.5 shows risk measure relative marginal changes for other risks \(RM^2_j\). Corresponding standard errors were also computed although not displayed, they showed approximately the same level as for the cyber risk displayed in Table 11.4. One can see a reasonable balance among \(RM^2\) metrics for most risks (although not the 13th and 15th) for the \(Cyber~Load\) 1.0 and 1.5. For other values of the \(Cyber~Load\) as well as risks 13 and 15 (corresponding to Motor Vehicle and Marine Hull), there is a lack of balance suggesting that the corresponding upper limits may be in doubt.

Table 11.5: ANU Cyber Risk Measure Relative Marginal Change
Cyber Load \(RM^2_{1}\) \(RM^2_{2}\) \(RM^2_3\): Cyber \(RM^2_{4}\) \(RM^2_{5}\) \(RM^2_{6}\) \(RM^2_{9}\) \(RM^2_{10}\) \(RM^2_{13}\) \(RM^2_{15}\)
0.5 0.233 0.303 0.395 0.325 0.317 0.310 0.328 0.329 0.267 0.336
1.0 0.234 0.303 0.327 0.319 0.317 0.310 0.324 0.330 0.245 0.386
1.5 0.235 0.290 0.295 0.309 0.306 0.304 0.309 0.319 0.290 0.353
2.0 0.233 0.291 0.338 0.308 0.307 0.304 0.308 0.300 0.290 0.361
2.5 0.233 0.291 0.386 0.309 0.307 0.303 0.308 0.300 0.254 0.353
3.0 0.233 0.291 0.321 0.308 0.307 0.303 0.309 0.299 0.258 0.353

11.5 Supplemental Materials

11.5.1 Further Resources and Reading

See, for example, Boyd and Vandenberghe (2004) and Simon and Blume (1994), for fascinating introductions to constrained optimization and more details about the Lagrange multiplier method.

11.5.2 Exercises

Exercise 11.1. de Finetti Optimal Retention Proportions. Consider the quota share agreement described in Section 4.3.2. Here, the insurer’s portion of the portfolio risk is \(Y_{insurer} = \sum_{i=1}^n c_i X_i\) \(= \mathbf{c}^{\prime} \mathbf{X}\). We seek to find those values of \(c_i\) that minimize \(\mathrm{Var}(Y_{insurer})\) subject to the constraint that \(\mathrm{E}(Y_{reinsurer}) = RTC_{max}\). Subject to this budget constraint, the insurer wishes to minimize the uncertainty of the retained risks as measured by the variance. We now further impose constraints that the sharing coefficients are bounded between 0 and 1, as considered by De Finetti (1940) (although he assumed independence among risks).

See also Glineur and Walhin (2006) and Pressacco, Serafini, and Ziani (2011) for additional background. \[ \boxed{ \begin{array}{lc} {\small \text{minimize}}_{c_1, \ldots, c_n} & \frac{1}{2} \mathrm{Var} (Y_{insurer}) = \frac{1}{2}\mathbf{c}^{\prime} \boldsymbol \Sigma \mathbf{c} \\ {\small \text{subject to}} & \sum_i (1-c_i)\mathrm{E} (X_i) \le RTC_{max} \\ & 0 \le c_i \le 1, \ \ \ \ \ \ i=1, \ldots, n, \end{array} } \]

a. Put the problem into standard form and develop the Lagrangian. Henceforth, assume zero correlations among risks and write the variance of the \(i\)th risk as \(\sigma_i^2\). Verify that the KKT conditions can be expressed as: \[\begin{equation} \boxed{ \begin{array}{cl} \sigma_i^2 c_i^* - LMI_1^*~ [\mathrm{E} ~X_i] + LMI_{2i}^* -LMI_{3i}^* = 0 & i=1, \ldots, n \\ LMI_{2i}^*\ge 0, LMI_{3i}^* \ge 0 & i=1, \ldots, n \\ LMI_{2i}^* (c_i^*-1) = 0, LMI_{3i}^* c_i^* = 0 & i=1, \ldots, n \\ 0 \le c_i^* \le 1 & i=1, \ldots, n \\ LMI_1^* \ge 0, \sum_i (1-c_i^*)\mathrm{E} (X_i) \le RTC_{max} & \\ LMI_1^*\left( \sum_i (1-c_i^*)\mathrm{E} (X_i) - RTC_{max} \right) = 0 & \\ \end{array} } \tag{11.15} \end{equation}\]

b. To get some practice, let us make some assumptions about the optimal coefficients and use the KKT conditions to verify how we expect the Lagrange multipliers to behave. Verify: \[\begin{equation} \boxed{ \begin{array}{c|cc} \text{Suppose} & \textit{KKT }\text{Results} \\ \hline 0 < c_i^* < 1 &LMI_{2i}^* = LMI_{3i}^*= 0 & \sigma_i^2 c_i^* - LMI_1^*~ [\mathrm{E} ~X_i] = 0 \\ c_i^* =0 & LMI_{2i}^* = LMI_{3i}^*= 0 & LMI_1^*= 0\\ c_i^* =1 &LMI_{3i}^* = 0& LMI_{2i}^* = LMI_1^*~ [\mathrm{E} ~X_i] - \sigma_i^2 . \\ \end{array} } \tag{11.16} \end{equation}\]

c. For another approach, let us make some assumptions about the Lagrange multipliers and use the KKT conditions to verify how we expect the optimal coefficients to behave. Verify: \[\begin{equation} {\scriptsize \boxed{ \begin{array}{cl|cl} \text{Suppose} & &\textit{KKT }\text{Results} \\ \hline LMI_{3i}^* > 0 & &c_i^* =0 &LMI_{2i}^* = 0\\ LMI_{2i}^* > 0 & &c_i^* =1 &LMI_{3i}^* = 0\\ LMI_{2i}^* = LMI_{3i}^* = 0 && \sigma_i^2 c_i^* - LMI_1^*~ [\mathrm{E} ~X_i] = 0 \\ \hline LMI_1^* = 0 & &LMI_{2i}^* = LMI_{3i}^* = 0 &c_i^* =0\\ \hline LMI_1^* > 0 & LMI_{3i}^* > 0 & \text{violates conditions} \\ LMI_1^* > 0 & LMI_{2i}^* > 0 & c_i^* =1 & \sigma_i^2- LMI_1^*~ [\mathrm{E} ~X_i] \\ & & & ~~~+ LMI_{2i}^* = 0\\ LMI_1^* > 0 & LMI_{2i}^* = &\sigma_i^2 c_i^* - LMI_1^*~ [\mathrm{E} ~X_i] = 0 & \\ & ~~LMI_{3i}^* = 0&& \\ \end{array} } \tag{11.17} } \end{equation}\]

d. Check the following. To determine the optimal coefficients, we first find the value of Lagrange multiplier \(LMI_1^*\) as the solution of this equation: \[\begin{equation} \sum_i \mathrm{E} (X_i) - \sum_i \min\left\{1,\frac{LMI_1^*~ [\mathrm{E} ~X_i]}{\sigma_i^2 } \right\}\mathrm{E} (X_i)= RTC_{max} \tag{11.18} \end{equation}\] and then determine the optimal coefficients as \[\begin{equation} c_i^* = \min\left\{1,\frac{LMI_1^*~ [\mathrm{E} ~X_i]}{\sigma_i^2 } \right\}. \tag{11.19} \end{equation}\]

Show the Exercise 11.1 Solution

Exercise 11.2. Lasso Regression Conditions. To get additional practice with the KKT conditions, we now apply them to the lasso regression problem introduced in Exercise 3.1. Similar to Exercise 3.1, we consider the problem \[ {\small \boxed{ \begin{array}{cc} {\small \text{minimize}}_{\boldsymbol \beta} & f_{0,ls}(\boldsymbol \beta) = \frac{1}{2} \sum_{i=1}^n (y_i - {\bf x}_i^{\prime} \boldsymbol \beta)^2 \\ {\small \text{subject to}} & \sum_{j=1}^p |\beta_j | \le c_{lasso} , \end{array} } } \] with observations \(({\bf x}_i, y_i)\), for \(i=1,\ldots, n\). If \(c_{lasso}\) is large enough, then the constraint has no effect and the resulting estimator is \(\hat{\boldsymbol \beta}^{ols}\) \(= \left[ {\bf X}^{\prime}{\bf X}\right]^{-1} {\bf X}'{\bf y}\). where \({\bf y}\) is a vector of responses and \({\bf X} = ({\bf x}_1', \ldots, {\bf x}_n')'\) is a matrix of explanatory variables. So, we choose \(c_{lasso}\) to be smaller than \(\sum_{j=1}^p |\hat{\beta}_j^{ols} |\) so the constraint has some effect.

As is common for problems involving absolute values, we can linearize the constraints by taking positive and negative parts. Specifically, define the positive part \(\beta_j^+ = \max(0,\beta_j)\) and the negative part \(\beta_j^- = \max(0,-\beta_j)\). Thus, the problem becomes \[ {\small \boxed{ \begin{array}{cc} {\small \text{minimize}}_{\boldsymbol \beta^+,\boldsymbol \beta^-} & f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) = \frac{1}{2} \sum_{i=1}^n [y_i - {\bf x}_i^{\prime} (\boldsymbol \beta^+ - \boldsymbol \beta^-)]^2 \\ {\small \text{subject to}} & \sum_{j=1}^p (\beta_j^+ +\beta_j^- ) \le c_{lasso} . \end{array} } } \] a. Confirm that the Lagrangian can be expressed as \[ \begin{array}{ll} LA(\boldsymbol \beta^+,\boldsymbol \beta^-) = & f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) +LMI_1[ \sum_{j=1}^p (\beta_j^+ +\beta_j^- ) -c_{lasso}] \\ &- \sum_{j=1}^p LMI_{1+j}^+ ~\beta_j^+ - \sum_{j=1}^p LMI_{1+j}^- ~\beta_j^- . \end{array} \]

b. Show that \[ \partial_{\boldsymbol \beta^+}~f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) = -{\bf X}^{\prime}{\bf e} = -\partial_{\boldsymbol \beta^-}~f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) , \] where \({\bf e}={\bf y} - {\bf X}\boldsymbol \beta\) is a vector of residuals.
c. At the optimum, show that \(\beta_j^+ >0\) implies \(\beta_j^- =0\) and vice-versa, that \(\beta_j^- >0\) implies \(\beta_j^+ =0\). Thus, there is no contradiction, as one would hope.
d. Let \({\bf x}^{(j)}\) be the \(j\)th column of \(\bf X\). At the optimum, show that \({\bf x}^{(j)'}{\bf e}= 0\) implies \(\beta_j = 0\).

Show the Exercise 11.2 Solution

Exercise 11.3. Linearity of Lasso Regression Conditions. This exercise is motivated by Exercise 3.27 of Hastie, Tibshirani, and Friedman (2009). Similar to Exercise 3.1 and Exercise 11.2, we consider an equivalent way of writing the problem is using the penalized version \[ \begin{array}{cc} {\small \text{minimize}}_{\boldsymbol \beta^+,\boldsymbol \beta^-} & f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) + \lambda \sum_{j=1}^p (\beta_j^+ +\beta_j^- ) . \end{array} \] The Lagrangian has essentially the same form as in Exercise 11.2 \[ \begin{array}{ll} LA(\boldsymbol \beta^+,\boldsymbol \beta^-) = & f_{0,ls}(\boldsymbol \beta^+,\boldsymbol \beta^-) +\lambda \sum_{j=1}^p (\beta_j^+ +\beta_j^- ) \\ &- \sum_{j=1}^p LMI_{1+j}^+ ~\beta_j^+ - \sum_{j=1}^p LMI_{1+j}^- ~\beta_j^- , \end{array} \] and so the solution set is the same.

a. At the optimum, show that \(\beta_j \ne 0\) implies \(\lambda =- sgn(\beta_j) ~{\bf x}^{(j)'}\left[{\bf y} - {\bf X}\boldsymbol \beta \right]\).
b. At the optimum, show that the Hessian is \[ \begin{array}{ll} \nabla^2 LA &= \left(\begin{array}{cc} {\bf X}^{\prime}{\bf X} & -{\bf X}^{\prime}{\bf X}\\ -{\bf X}^{\prime}{\bf X} & {\bf X}^{\prime}{\bf X} \end{array} \right) .\\ \end{array} \]

c. Suppose that we perturb the penalty parameter \(\lambda\) slightly and solve the problem for \(\lambda_{\delta}=\lambda + \delta\). Call the perturbed solution \(\boldsymbol \beta_{\delta}\). Then, if \(sgn(\beta_j)>0\) show that \[ \begin{array}{ll} \delta & = {\bf x}^{(j)'}{\bf X} \left[\boldsymbol \beta_{\delta} -\boldsymbol \beta \right] \\ \end{array} \]

d. At the optimum, use the Perturbation Sensitivity Proposition from Section 10.1 to show that \[ - {\bf 1}_{A}= [\nabla^2 LA]_A ~\partial_{\lambda} (\boldsymbol \beta^+,\boldsymbol \beta^-)_A , \] where the subscript \(A\) means that we restrict the rows and columns to be those in the active set (where the corresponding parameters are not zero). Thus, the vector of derivatives \(\partial_{\lambda} (\boldsymbol \beta^+,\boldsymbol \beta^-)_A\) is constant in the penalty parameter \(\lambda\), meaning that the solution is locally linear. (See Rosset and Zhu (2007) for extensions of this idea to more general settings.)

Show the Exercise 11.3 Solution

Exercise 11.4. Standard Errors of Ratios. To determine the accuracy of the \(RM^2\) in Example 11.4.5, note that it can be expressed as the ratio of two averages. Especially with simulation applications, it can be useful to understand the reliability of this type of statistic.

To this end, let \(\{ (x_{11},x_{21}), \ldots, (x_{1R},x_{2R})\) be a random sample of size \(R\). Summarize this sample with means \(\bar{x}_1\) and \(\bar{x}_2\), sample variances \(s_1^2\) and \(s_2^2\), coefficient of variations \(CV_1 = s_1/\bar{x}_1\) and \(CV_2 = s_2/\bar{x}_2\) and correlation coefficient \(r_{12} = [(R-1) s_1 ~ s_2]^{-1} \left[\sum_{r=1}^R (x_{1r}-\bar{x}_1)(x_{2r}-\bar{x}_2)\right]\). It is known from the mathematical statistic literature (see, for example, Levy and Lemeshow (2013)), that a desirable approximate standard error for the ratio of the means is \[ se\left(\frac{\bar{x}_1}{\bar{x}_2}\right) = \frac{(\bar{x}_1/\bar{x}_2)}{\sqrt{R}} \sqrt{CV_1^2 + CV_2^2 - 2r_{12}CV_1 CV_2 } . \]

a. For the risk measure relative marginal change in equation (11.14), identify the two variables \(x_1\) and \(x_2\) that you would use to compute the standard error.
b. Suppose that both \(x_1\) and \(x_2\) are binary variables (as in the upper limit special case). Show that \(s_j^2 = \bar{x}_j(1-\bar{x}_j)/R\), for \(j=1,2\), and the correlation coefficient reduces to \[ r_{12} \approx \frac{\bar{x}_{12} - \bar{x}_1\bar{x}_2}{\sqrt{ \bar{x}_1(1-\bar{x}_1) ~ \bar{x}_2(1-\bar{x}_2)}} , \] where \(\bar{x}_{12} = (\sum_{r=1}^R x_{1r}x_{2r})/R\).
c. Suppose further that \(x_{1r}=0 \implies x_{2r}=0\). Then show that the correlation coefficient can be expressed in an odds ratio form \[ r_{12} = \sqrt{\frac{(1- \bar{x}_1)/\bar{x}_1}{ (1-\bar{x}_2)/\bar{x}_{2}}} . \]

Show the Exercise 11.4 Solution

Exercise 11.5. Quantile Regression Approach. This is a follow-up to Exercise 7.1.1 where we explored the relationship between quantile regression and the auxiliary version of expected shorted fall. In this exercise, we defined the objective functions \[ \begin{array}{ll} f_{QR,0}(z_0, \boldsymbol \theta) &= \mathrm{E}_R~ \phi_{\alpha}[g({\bf X};\boldsymbol \theta)-z_0] \\ f_{ES,0}(z_0, \boldsymbol \theta) &=z_0 + \frac{1}{(1-\alpha)} \mathrm{E}_R [g({\bf X};\boldsymbol \theta) - z_0]_+ \end{array} \] and showed their relationship through the expression \[ f_{ES,0}(z_0, \boldsymbol \theta) = \mathrm{E}_R ~g({\bf X};\boldsymbol \theta) + \frac{1}{1-\alpha} f_{QR,0}(z_0, \boldsymbol \theta) . ~~~~(7.12) \] Now, assume that we only have a budget constraint of the form \(RTC(\boldsymbol \theta) \le RTC_{max}\).

a. Show that the Lagrangians derivatives of the two problems can be expressed as the following. \[ \begin{array}{ll} \partial_{\theta_j} LA_{QR} &= ~~~~~~~~~\partial_{\theta_j} f_{QR,0}(z_0, \boldsymbol \theta) - \bar{g}_j ~~~LMI_{1,QR}\\ \partial_{\theta_j} LA_{ES} &= \frac{1}{1-\alpha} \left\{\partial_{\theta_j} f_{QR,0}(z_0, \boldsymbol \theta) - \bar{g}_j (1-\alpha)(LMI_{1,ES}-1) \right\} .\\ \end{array} \]

b. Interpret this to mean that the two algorithms will converge to the same solutions, so analysts need only implement one of them. Moreover, it will be the case that generally the two approaches will have the same level of computational difficulty.

Show the Exercise 11.5 Solution

11.5.3 Appendix. Establishing the KKT Conditions

A proof to establish the KKT conditions for a general setting is cumbersome and not very insightful. Fortunately, there are several good resources where readers can access the details, including Boyd and Vandenberghe (2004), Simon and Blume (1994), and Nocedal and Wright (2006).

In the following, we provide sketches of proofs in two important special cases, of a single equality and inequality constraints. These sketches provide some useful intuition into the underpinnings of the KKT conditions.

11.5.3.1 Single Equality Constraint

We now restrict considerations to a single equality constraint so that the problem in Display (3.9) reduces to \[ \boxed{ \begin{array}{ccc} {\small \text{minimize}} & f_0({\bf z}) & \\ {\small \text{subject to}} & f_{con,1}({\bf z}) = 0 & \\ \end{array} } \] and corresponding Lagrangian \[ LA\left({\bf z},LME_1\right) = f_0({\bf z}) + LME_1 ~f_{con,1}({\bf z}) . \] In this setting, the Display (11.1) reduces to \[ \boxed{ \begin{array}{ll} \partial_{\theta_i} ~ \left. LA\left({\bf z},LME_1^* \right) \right|_{{\bf z}={\bf z}^*} = 0 & i=1, \ldots, p \\ f_{con,1}({\bf z}^*) = 0 & . \\ \end{array} } \] These are the classical conditions that appear in many beginning economics and calculus courses.

\(Under~the~Hood.\) Show the Justification of KKT Conditions for a Single Equality Constraint

11.5.3.2 Single Inequality Constraint

We now restrict considerations to a single inequality constraint so that the problem in Display (3.9) reduces to \[ \boxed{ \begin{array}{ccc} {\small \text{minimize}} & f_0({\bf z}) & \\ {\small \text{subject to}} & f_{con,1}({\bf z}) \le 0 & \\ \end{array} } \] and corresponding Lagrangian \[\begin{equation} LA\left({\bf z},LMI\right) = f_0({\bf z}) + LMI ~f_{con,1}({\bf z}) . \tag{11.20} \end{equation}\] In this setting, the Display (11.1) reduces to \[ \boxed{ \begin{array}{ll} \partial_{\theta_i} ~ \left. LA\left({\bf z},LMI^* \right) \right|_{{\bf z}={\bf z}^*} = 0 & i=1, \ldots, p \\ LMI^* \times f_{con,1}({\bf z}^*) = 0 & \\ LMI^* \ge 0 \\ f_{con,1}({\bf z}^*) \le 0 . & \\ \end{array} } \]

\(Under~the~Hood.\) Show the Justification of KKT Conditions for a Single Inequality Constraint