Multivariate Variables

Insurance data are typically are multivariate in the sense that we take many measurements on each entity. For example, suppose that the entity is a manufacturing firm and you wish to study losses associated with a it’s worker’s compensation plan. In this case, you might want to know the location of the firm’s manufacturing plants, the industry in which it operates, the number of employees, and so forth. If there are many variables, such data are also known as high dimensional.

The usual strategy for analyzing multivariate data is to begin by examining each variable in isolation of the others. This is known as a univariate approach. By considering only one measurement, variables are scalars and, as described, can be thought broadly as either qualitative or quantitative.

In contrast, for some variables, it makes little sense to only look a one dimensional aspects. For example, insurers typically organize spatial data by longitude and latitude to analyze the location of weather related insurance claims due hailstorms. Having only a single number, either longitude or latitude, provides little information in understanding geographical location.

Another special case of a multivariate variable, less obvious, involves coding for missing data. Historically, some statistical packages used a “-99” to report when a variable, such as insured’s age, was not available or not reported. This led to many unsuspecting analysts providing strange statistics when summarizing a set of data. When data are missing, it is better to think about the variable as having two dimensions, one to indicate whether or not the variable is reported and the second providing the age (if reported).

In the same way, insurance data are commonly censored and truncated. To illustrate, with automobile claims may be limited or censored by 500,000, the upper limit that the insurer will pay. The loss amount may be in excess of 500000 but the insurer is only aware of its payout. To record censored claims, a binary variable is used to indicate whether or not the claim is censored (limited) and a second variable is used to indicate the payout. In the same way, claims may be truncated by a deductible. Although there are many types of deductibles, in a common form the insurer pays the amount in excess of a deductible. To illustrate, suppose you have an auto policy with a 250 deductible. If you have a 1000 loss, then the insurer pays 750. If you have a 200 loss, then the insurer pays nothing. In principle, one would like to use a binary variable to indicate whether or not the claim has a deductible and a second variable is used to indicate the payout. As we will see, the tricky thing about deductibles is that for many sampling schemes, the insurer does not observe a claim if it the loss falls below the deductible amount. More on this topic later.

Aggregate claims can also be coded as another special type of multivariate variable. In this situation, an insurer has potentially zero, one, two, or more claims, within a policy period. Each claim has its own level (possibly mediated by deductibles and upper limits) and there are an uncertain, or random, number of each claims for each individual. This is a case where the dimension of the multivariate variable is not known in advance.

Perhaps the most complicated type of “multivariate variable” is a realization of a stochastic process. You will recall that a stochastic process is little more than a collection of random variables. For an insurance special case, we might think about the times that claims arrive to an insurance company in a one year time horizon. This variable is theoretically is infinite dimensional in that most of our models permit an arbitrarily large number of claims. Special techniques are required to understand realizations of stochastic processes; although this is not our focus, it is still helpful to be aware of this variable type.

Does This Make Sense?

Quiz questions allow for immediate assessment of your understanding of a section. Try them out.

The list with open boxes gives examples of variables that one might encounter in insurance analysis. The shaded text provides general variable types. Associate the variable types with the specific examples by dragging the types into the open boxes for specific examples.


[raw] [/raw]