AI/ML News & Innovations Hub

For the zero-truncated Negative Binomial specification often used in count regression--e.g., Ch.8 in this book--I seem unable to find published results to verify the derivatives of the log-likelihood with respect to the so-called dispersion parameter, or its reciprocal (which I find more tractable).

Below I provide some context and my best stab at the derivatives I am trying to verify. Grateful for any feedback re:

What I might have done wrong;
Published results (besides the work I cite) that could be used for verification.

Disclaimer - posting here after failing to get input in this other forum

Context

Consider a random count-variable with probability mass function (p.m.f):

\begin{aligned} Y_i \sim \textrm{Z.T. Neg. Bin}(\lambda_i,\alpha) & \equiv \Pr(Y_i = y_i | Y_i > 0) \\ & = \left. \frac{\Gamma(y_i+\alpha)}{\Gamma(y_i+1)\Gamma(\alpha)}\left(\alpha^{-1}\lambda_i\right)^{y_i}\left(1+\alpha^{-1}\lambda_i\right)^{-(\alpha+y_i)} \middle/ 1-(1+\alpha^{-1}\lambda_i)^{-\alpha} \right. \end{aligned}

where:

$y_i > 0 \quad (i=1,2,\dots,n)$ denotes the observed realisations (observations enter a zero-truncated sample only after the first count occurs);
$\lambda_i = e^{\mathbf{x}_i^T\boldsymbol{\beta}}$ is the link function, with $\mathbf{x}^T_i$ being the $i$-th row of the regression data matrix; and $\boldsymbol{\beta}$ the unknown vector of regression coefficients (to be estimated);
$\alpha$ is another unknown parameter to be estimated, namely the reciprocal of the so-called dispersion parameter ($\alpha$ is inherited from the Poisson-Gamma mixture that brings about the unconstrained Negative Binomial see e.g., p.745, in this book).
$\Gamma(\cdot)$ is the gamma function.

The corresponding log-likelihood is:

$$ \ell{\left(\boldsymbol{\beta},\alpha\right)}=\sum_{i=1}^n\left\{\ln{\left[\mathrm{\Gamma}\left(y_i+\alpha\right)\right]}-\ln{\mathrm{\Gamma}\left(\alpha\right)}-\ln{\left(y_i!\right)}+y_i\mathbf{x}_i^T\boldsymbol{\beta}-y_i\ln{\alpha}-\left(\alpha+y_i\right)\ln{\left(1+\alpha^{-1} e^{\mathbf{x}_i^T\boldsymbol{\beta}}\right)}-\ln{\left[1-\left(1+\alpha^{-1}e^{\mathbf{x}_i^T\boldsymbol{\beta}}\right)^{-\alpha}\right]}\right\} $$

With $\ell\left(\boldsymbol{\beta},\alpha\right)$:

It is straightforward to verify the derivatives of $\ell\left(\boldsymbol{\beta},\alpha\right)$ with respect to the regression parameters $\boldsymbol{\beta}$. Yet those references I use for verification--e.g., this paper--usually omit the derivatives with respect to $\alpha$.
It is rare to find references that mention the derivatives of $\ell\left(\boldsymbol{\beta},\alpha\right)$ with respect to the dispersion parameter $\alpha^{-1}$--e.g., this other paper. Yet those references usually do not provide explicit results I can use directly for validation.

Results to verify

I now move on to the derivatives I would like to verify.

First derivative

I begin by differentiating $\ell(\boldsymbol{\beta},\alpha)$ w.r.t. $\alpha$. For conciseness I write $\lambda_i$ for $e^{\mathbf{x}_i^T\boldsymbol{\beta}}$:

\begin{aligned} \frac{\partial}{\partial\alpha}\ell{\left(\boldsymbol{\beta},\alpha\right)} & = \sum_{i=1}^n\left\{\left[\Psi\left(y_i+\alpha\right)-\Psi\left(\alpha\right)\right]+\ln{\alpha}+1-\ln{\left(\lambda_i+\alpha\right)}-\frac{\left(\alpha+y_i\right)}{\left(\lambda_i+\alpha\right)}-\frac{\partial}{\partial\alpha}\ln{\left[1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}\right]}\right\} \end{aligned}

where $\Psi(\cdot)$ is the digamma function, and the argument in the summation, except the last term in square brackets, is the corresponding derivative for the unrestricted case (I provide more details about this derivative in this other post).

The first result I am unsure of, and wish to verify is the partial derivative in the previous expression:

\begin{aligned} \frac{\partial}{\partial\alpha}\ln\left[1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}\right] &= -\frac{\frac{\partial}{\partial\alpha}\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}}{1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}} \\& =\frac{\left(\frac{1}{1+\alpha^{-1}\lambda_i}\right)^\alpha \ln\left(\frac{1}{1+\alpha^{-1}\lambda_i}\right)\left[\frac{\lambda_i}{\alpha^2}\frac{1}{{(1+\alpha^{-1}\lambda_i)}^2}\right]}{1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}} \\ & = \textrm{E}\left[Y_i|Y_i > 0\right]\frac{\ln\ \left(1+\alpha^{-1}\lambda_i\right)}{\alpha^2\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}} \end{aligned}

where $\textrm{E}\left[Y_i|Y_i > 0\right]=\frac{\lambda_i}{1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}}$ is the mean of the zero-truncated p.m.f. shown earlier.

I am not sure if my computations are correct. The only term of comparison I have is equations A.2 in this paper, which is not explicit: the form of my solution seems correct, but the derivative cannot be verified.

Second derivative

The second derivative with respect to $\alpha$ may be obtained by differentiating the previous expression:

\begin{aligned} \frac{\partial^2}{\partial\alpha^2}\ell{\left(\boldsymbol{\beta},\alpha\right)} & =\sum_{i}\left\{\Psi^\prime\left(y_i+\alpha\right)-\Psi^\prime\left(\alpha\right)+\frac{1}{\alpha}-\frac{2}{\left(\lambda_i+\alpha\right)}+\frac{\alpha+y_i}{\left(\lambda_i+\alpha\right)^2}-\frac{\partial}{\partial\alpha}\left[\textrm{E}\left[Y_i|Y_i>0\right]\frac{\ln\ \left(1+\alpha^{-1}\lambda_i\right)}{\alpha^2\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}\right]\right\} \end{aligned}

where $\Psi^{'}(\cdot)$ is the trigamma function. Except for the partial derivative in the last term in square brackets, the argument in the summation above is the corresponding second derivative for the unconstrained Negative Binomial--I touch on this derivative in this other post.

The second result I wish to verify is this portion of the previous expression:

\begin{aligned} \frac{\partial}{\partial\alpha}\left[\textrm{E}\left[Y_i|Y_i>0\right]\frac{\ln \left(1+\alpha^{-1}\lambda_i\right)}{\alpha^2\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}\right] & = \frac{\lambda_i\frac{\partial}{\partial\alpha}\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}}{\left[1-\left(1+\alpha^{-1}\lambda_i\right)^{-\alpha}\right]^2}\left[\frac{\ln \left(1+\alpha^{-1}\lambda_i\right)}{\alpha^2\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}\right] + \\ & \quad E\left[Y_i|Y_i>0\right]\left[\frac{\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}\frac{\partial}{\partial\alpha}\alpha^{-2}\ln \left(1+\alpha^{-1}\lambda_i\right)}{\left(1+\alpha^{-1}\lambda_i\right)^{2\left(\alpha+2\right)}} + \right. \\ & \quad \left. -\frac{\alpha^{-2}\ln \left(1+\alpha^{-1}\lambda_i\right)\frac{\partial}{\partial\alpha}\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}{\left(1+\alpha^{-1}\lambda_i\right)^{2\left(\alpha+2\right)}}\right] \\ & = -\left[E\left[\operatorname{Y}_i|\operatorname{Y}_i>0\right]\frac{\ln \left(1+\alpha^{-1}\lambda_i\right)}{\alpha^2\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}\right]^2 \\ & \quad -E\left[Y_i|Y_i>0\right]\frac{1}{\alpha^3}\left[\frac{2\ln \left(1+\alpha^{-1}\lambda_i\right)}{\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}+\frac{\alpha^{-1}\lambda_i}{\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+3}}\right] \\ & \quad +E\left[Y_i|Y_i>0\right]\frac{\lambda_i}{\alpha^4}\left\{\frac{\left[\ln \left(1+\alpha^{-1}\lambda_i\right)\right]^2}{\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+2}}+\frac{2\ln \left(1+\alpha^{-1}\lambda_i\right)}{\left(1+\alpha^{-1}\lambda_i\right)^{\alpha+3}}\right\} \end{aligned}

Once again, the only source for verification I am aware of is equations A.3 in this paper, which is not explicit. Such source seems to confirm the form of the first two terms in the last expression above but not the last term. Yet it provides no guidance to evaluate whether my derivatives are correct

Grateful for any pointer that may help me check if my computations are correct.

zero-truncated negative binomial regression: scoring and information equations in the dispersion parameter

Context

Results to verify

First derivative

Second derivative