Sherman Lo - 2021
The compound Poisson distribution was used to model precipitation (Revfeim, 1984; García, 1995; Dunn, 2004; Dzupire, 2018), insurance claims (Jørgensen, 1994; Smyth, 2002) and x-ray images (Whiting, 2002; Elbakri, 2003; Whiting, 2006), the latter was the topic of a PhD thesis (Lo, 2020). The distribution is interesting because it models zero-inflated positive numbers and it is in the exponential family under certain conditions (Jørgensen, 1987). This report covers the compound Poisson and extends it to a time series in the context of precipitation forecasting.
Many materials presented here were copied from (Lo, 2020) which includes further algebra.
The notation used may vary compared to other literature.
A special case occurs when \(U_i\sim\mathrm{Gamma}\left(\alpha,\beta\right)\) where \(\alpha>0\) is the gamma shape parameter and \(\beta>0\) is the gamma rate parameter. Then it can be shown that \(Y|Z\sim\mathrm{Gamma}(Z\alpha, \beta)\). The random variable \(Y\) follows the compound Poisson distribution, sometimes called the compound Poisson-Gamma distribution, and is denoted by \(Y\sim\mathrm{CP}(\lambda,\alpha,\beta)\). The p.d.f. is \[ p_Y(y) = \begin{cases} \delta(y) \mathrm{e}^{-\lambda} & \text{ for } y=0 \\ \displaystyle\sum_{z=1}^{\infty}\dfrac{\beta^{z\alpha}}{\Gamma(z\alpha)}y^{z\alpha-1}\mathrm{e}^{-\beta y}\mathrm{e}^{-\lambda}\frac{\lambda^z}{z!} & \text{ for } y>0 \end{cases} \ . \label{eq:compoundPoisson_pdf} \]
It can be shown that the compound Poisson distribution is in the exponential family for known \(\alpha\) (Jørgensen, 1987). To show this, the compound Poisson distribution was parametrised using the following: \[ p=\frac{2+\alpha}{1+\alpha} \ , \] \[ \mu=\frac{\lambda\alpha}{\beta} \ , \] \[ \phi = \frac{\alpha+1}{\beta^{2-p}(\lambda\alpha)^{p-1}} \ . \] The parameters \(p\), \(\mu\) and \(\phi\) are called the index, mean and dispersion parameters respectively and take the values of \( 1 < p < 2 \), \( \mu> 0 \) and \( \phi > 0 \). It can be shown that the p.m.f. at zero is \[ \mathbb{P}(Y=0) = \exp \left[ -\frac{\mu^{2-p}}{\phi(2-p)} \right] \] and the p.d.f. for \(y>0\) is \[ p_Y(y) = \exp\left[ \frac{1}{\phi} \left( y\frac{\mu^{1-p}}{1-p}-\frac{\mu^{2-p}}{2-p} \right) \right] \frac{1}{y} \sum_{z=1}^{\infty}W_z(y,p,\phi) \] where \[ W_z = W_z(y,p,\phi)=\frac{y^{z\alpha}}{\phi^{z(1+\alpha)}(p-1)^{z\alpha}(2-p)^zz!\Gamma(z\alpha)} \ . \] This is in the form of a distribution in the dispersive exponential family (Nelder and Wedderburn, 1972; Nelder and Baker, 1972; McCullagh, 1984) \[ p_Y(y)=\dfrac{\exp\left(y\theta\right)g(y)}{Z(\theta)} \] where \(Z(\theta)\) is the partition function and \(\theta\) is the natural parameter. For the compound-Poisson, the natural parameter is \[ \theta = \theta(\mu) = \dfrac{\mu^{1-p}}{\phi(1-p)} \label{eq:appendix_cp_naturalParameter} \] and the partition function is \[ Z = Z(\mu) = \exp\left[ \frac{\mu^{2-p}}{\phi(2-p)} \right] \] or in terms of the natural parameter \[ Z = Z(\theta) = \exp\left[ \phi^{\frac{1}{1-p}} \cdot \theta^{\frac{2-p}{1-p}} \cdot \dfrac{ (1-p)^{\frac{2-p}{1-p}} }{ 2-p } \right] \ . \]
The partition function has some useful properties such as \[ \mathbb{E}[Y] = \dfrac{\partial \ln Z}{\partial \theta} \] and \[ \mathbb{V}\mathrm{ar}\left[Y\right] = \dfrac{\partial^2\ln Z}{\partial \theta^2} \ . \label{eq:appendix_variancePartitionFunction} \] Thus it can be shown that \[ \mathbb{E}[Y] = \mu \] and \[ \mathbb{V}\mathrm{ar}[Y] = \phi \mu^p \] which shows that the compound Poisson distribution is in the Tweedie dispersion exponential family with \( 1 < p < 2 \), making it suitable for generalised linear models.
Suppose \(Y_i\sim\mathrm{CP}(\lambda, \alpha, \beta)\) for \(i=1,2,\ldots,N\) and i.i.d. with the following observations \(y_1, y_2, \ldots, y_N\) respectively. It is possible to estimate the parameters \(\lambda, \alpha\) and \(\beta\) numerically using using maximum likelihood where the log likelihood is \(\ln L=\sum_{i=1}^N\ln p_Y(y_i)\). When evaluating the log likelihood, thus the p.d.f. as well, the infinite sum \(\sum_{z=1}^{\infty}W_z(y,p,\phi)\) can be calculated approximately by summing over a finite number of large terms (Dunn, 2005). One method to maximise the likelihood is the EM algorithm (Dempster, 1977) by iteratively estimating \(Z_i|Y_i,\lambda,\alpha,\beta\) for \(i=1,2,\ldots,N\) followed by estimating \(\lambda,\alpha,\beta|Z_1,Y_1,Z_2,Y_2,\ldots,Z_N,Y_N\). This was explored in (Lo, 2020) but required numerous approximations.
Method of moments estimators does exist (Withers and Nadarajah, 2011).
For known \(p\) or \(\alpha\), the compound Poisson is in the exponential family so parameter estimation can be done via the generalised linear model framework (see the R package cpglm) (Zhang, 2013). Estimating \(p\) is difficult and various methods were discussed (Zhang, 2013). One way is to estimate \(\mu\) and \(\phi\) on a grid of \(p\)'s and then select the \(p\) which maximises the likelihood (Dunn, 2005).
In the precipitation forecasting problem, a time series for the model fields \(x_1, x_2, \ldots, x_T\) is given. The objective is to use the model fields as explanatory variables for the time series of the precipitation \(Y_1, Y_2, \ldots, Y_T\). The time series can be simplified by fixing \(\alpha\) so that the distribution of the time series is in the exponential family. Such a simplification can introduce auto-correlation behaviour using the generalised autoregressive moving-average (GARMA) model (Benjamin, 1998).
To study the compound Poisson in the exponential family, the compound Poisson is to be parameterised as \[ Y_t \sim \mathrm{CP}\left( \dfrac{\mu_t^{2-p}}{\phi(2-p)}, \alpha, \dfrac{1}{\phi(p-1)\mu_t^{p-1}} \right) \] so that the index parameter, \(p\), and the dispersion parameter, \(\phi\), are fixed. The mean parameters, \(\mu_1, \mu_2, \ldots, \mu_T\), can be made as a function of the model fields \[ \eta_t=g(\mu_t) \] where \(g(\mu)\) is some link function and \(\eta_t=\eta(x_t)\) is the systematic component. An example of a systematic component is a basic linear relationship \[ \eta_t=x_t \beta \] where \(\beta\) is the regression parameter.
The choice of the link function \(g(\mu_t)\) is left as a free choice. But because a compound Poisson random variable can take non-negative values, it is recommended that the link function has a domain of \(0 < \mu < \infty\) and range \(-\infty < g(\mu_t) < \infty\). An example would be the log link function \[ g(\mu_t)=\ln(\mu_t) \ . \]
The GARMA model (Benjamin, 1998) include autoregressive and moving-average terms \[ \eta_t=x_t \beta + \sum_{i=1:t>i}^r \phi_i \left(g(y_{t-i})-x_{t-i}\beta\right) + \sum_{j=1:t>j}^s\theta_j\left( g(y_{t-j})-\eta_{t-j} \right) \] where \(\phi_1,\phi_2,\ldots,\phi_r\) are autoregressive (AR) parameters and \(\theta_1,\theta_2,\ldots,\theta_s\) are moving-average (MA) parameters. (Benjamin 1998) used the Normal, Gamma and Poisson distributions as examples of distributions from the exponential family. However, when used with the log link function, the AR and MA components for GARMA are not suitable for the compound Poisson because the AR and MA components would take values of \(-\infty\) when \(y_{t-1}=0\), i.e. when it has not rained on the previous day. A way around it is to replace \(y_t\) with \(y_t^*\) such that \[ \eta_t = x_t \beta + \sum_{i=1:t>i}^r \phi_i \left(g(y_{t-i}^*)-x_{t-i}\beta\right) \\+ \sum_{j=1:t>j}^s\theta_j\left( g(y_{t-j}^*)-\eta_{t-j} \right) \] where \[ y_t^*=\max(y_t, c) \] and \(c\) is a small positive number (Benjamin, 1998). This allows the evaluation of \(g(y_{t-i}^*)\) to be possible for \(y_{t-i}=0\).
Another option is to remove the AR element and consider positive \(\theta_j\). This can be studied by considering one MA term \[ \eta_t = x_t \beta + \theta_1\left( g(y_{t-1})-\eta_{t-1} \right) \] and to investigate the limiting scenarios. Consider a finite \(\eta_{t-1}\) which implies a finite \(\mu_{t-1}\) and a non-zero probability that \(Y_{t-1}=0\). If \(y_{t-1}=0\) was observed, then this would cause \(g(y_{t-1})\rightarrow -\infty\), \(\eta_{t}\rightarrow -\infty\) and \(\mu_t\rightarrow 0\). In this limiting case, \(Y_t=0\) with probability one. For the following day, the systematic component is \( \eta_{t+1} = x_{t+1} \beta + \theta_1\left( g(y_{t})-\eta_{t} \right) \). Because \(y_t=0\), \(g(y_t)\rightarrow - \infty\) and \(\eta_t\rightarrow - \infty\), it could be suggested that \(g(y_t)-\eta_t=0\) which would make is possible for \(\eta_{t+1}\) to be finite. Then for non zero \(\theta_1,\theta_2,\ldots,\theta_s\), another version of the systematic component is \[ \eta_t = x_t\beta + \sum_{j=1:t>j}^s\theta_j\Theta_{t-j} \] where \[ \Theta_{t} = \begin{cases} -\infty & \text{if }y_t=0\text{ and } \mu_t\neq 0 \\ 0 & \text{if }y_t=0\text{ and } \mu_t= 0 \\ g(y_{t})-\eta_{t} & \text{otherwise} \end{cases} \ . \] This version may be too restrictive, for example, dry days can only occur on consecutive days.
There may be a possibility to relax the fixed \(p\) and \(\phi\) assumption but it shall not be discussed here.
The GARMA model for the compound Poisson is unsuitable to model precipitation because the assumptions required produce unrealistic results, e.g. dry days can only occur on consecutive days. This stems from the fact that the log link function is applied to realisations of the compound Poisson, \(g(y_t)\), in the systematic component and problems arise on dry days, \(y_t=0\).
Methods for parameter fitting, model selection and forecasting are omitted but important when applying to precipitation forecasting. Model selection investigates how many AR and MA terms to use and what model fields to include in the systematic component. For model fitting, numerical methods would be required to maximise a likelihood or to study a posterior distribution using methods such as gradient descent or Markov chains using Monte Carlo (Brooks, 2011).
Any stationarity assumptions were not investigated here but it has been demonstrated that the GARMA model for the compound Poisson is quite restrictive.
The model can be further generalised, for example, the systematic component could be \(\eta_t=f_\beta(x_t)\) where \(f_\beta(x_t)\) is a neural network parameterised by \(\beta\) to allow for non-linear relationships.
Other distributions to model precipitation were studied, for example, using a mixture of zero mass and Gamma distributions (Holsclaw, 2017; Bertolacci, 2019). It could be argued that the compound Poisson has some strengths, for example, the occurrences of rain can be modelled as well. High winds could contribute to higher occurrences of rain by moving scattered rain clouds across the country.
Another advantage is that the compound Poisson distribution is in the exponential family which could lead to simpler mathematics and the potential to use existing methods.
Forecasting precipitation becomes interesting when forecasting multiple locations on a grid. Figure 2 shows observed precipitation from the E-obs dataset (Cornes et al., 2019). Realistically, rain clouds form shapes and a smooth boundary between zero and positive precipitation can be seen in the observed data. Spatial forecasting should aim to replicate such spatial structure if realism is important.
Suppose each point on the grid has coordinates \(\mathbf{r}_1,\mathbf{r}_2,\ldots,\mathbf{r}_M\). A simple strategy to model multiple locations is to fit the model to each location independently; each location will have different inferred parameters which can be learnt independently. Another option is to put a Gaussian process prior (Williams and Rasmussen, 1996) on the parameters to allow them to be spatially smooth. However, these methods are not realistic because the resulting model does not form rain clouds. Even if each location has the same parameters, the realised precipitation is not spatially smooth. Imagine the entire grid has the state space \(\lambda_t=1\), the realised grid of \(Z_t\)'s would be wet and drypoints scattered randomly across the grid; this is not characteristic of rain clouds.
Rain clouds can be formed using a graphical model for the Poisson-element (Yang et al., 2013). Consider a day \(t\) and let \(Z_{\mathbf{r}_1}, Z_{\mathbf{r}_2}, \ldots, Z_{\mathbf{r}_M}\) be the latent Poisson variable with respective rate parameters \(\lambda_{\mathbf{r}_1}, \lambda_{\mathbf{r}_2}, \ldots, \lambda_{\mathbf{r}_M}\) for the \(M\) grid points. Let \(Q(\mathbf{r}_i)\) be a collection of four coordinates which neighbours \(\mathbf{r}_i\). The graphical model should persuade \(Z_{\mathbf{r}_i}\) to take the same, or similar, value as \(Z_\mathbf{s}\) for \(\mathbf{s}\in Q(\mathbf{r}_i)\) to form rain clouds. This can be done by adjusting the conditional p.m.f. of \(Z_{\mathbf{r}_i}\) to \[ \mathbb{P}(Z_{\mathbf{r}_i} = z| Z_\mathbf{s} : \mathbf{s}\in Q(\mathbf{r}_i)) \propto \exp\left[ z\ln \lambda_{\mathbf{r}_i} - \ln{z!} - \sum_{\mathbf{s}\in Q(\mathbf{r}_i)} \gamma \left(z - Z_\mathbf{s}\right)^2 \right] \] where \(\gamma\) is the interaction parameter. The interaction term decreases the p.m.f. for values of \(z\) which deviates too much from it neighbours which should hopefully encourages the forming of rain clouds. However, as with many graphical models, inference can be computationally intensive, let alone for \(T\) lots of graphical models to form a spatial forecast.