Title: | Double Generalized Linear Models |
---|---|
Description: | Model fitting and evaluation tools for double generalized linear models (DGLMs). This class of models uses one generalized linear model (GLM) to fit the specified response and a second GLM to fit the deviance of the first model. |
Authors: | Gordon Smyth, Peter K Dunn <[email protected]>, Robert W. Corty |
Maintainer: | Gordon Smyth <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.8.6 |
Built: | 2025-02-18 03:55:15 UTC |
Source: | https://github.com/cran/dglm |
Compute an analysis of deviance table for one or more double generalized linear model fits.
## S3 method for class 'dglm' anova(object, ...)
## S3 method for class 'dglm' anova(object, ...)
object |
objects of class |
... |
Not used. |
Specifying a single object gives sequential and adjusted likelihood ratio tests for the mean and dispersion model components of the fit. The aim is to test overall significance for the mean and dispersion components of the double generalized linear model fit. The sequential tests (i) set both mean and dispersion models constant, add the mean model and (ii) sequentially add the dispersion model. The adjusted tests determine whether the mean and dispersion models can be set constant separately.
An object of class "anova"
inheriting from class "data.frame"
.
The anova method is questionable when applied to an "dglm"
object with
method="reml"
(stick to method="ml"
).
Gordon Smyth, ported to R by Peter Dunn ([email protected])
Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S, edited by J. M. Chambers and T. J. Hastie, Wadsworth and Brooks/Cole.
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60. doi:10.1111/j.2517-6161.1989.tb01747.x
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709. doi:10.1002/(SICI)1099-095X(199911/12)10:6<695::AID-ENV385>3.0.CO;2-M https://gksmyth.github.io/pubs/Ties98-Preprint.pdf
Smyth, G. K., and Verbyla, A. P. (1999). Double generalized linear models: approximate REML and diagnostics. In Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling, Graz, Austria, July 19-23, 1999, H. Friedl, A. Berghold, G. Kauermann (eds.), Technical University, Graz, Austria, pages 66-80. https://gksmyth.github.io/pubs/iwsm99-Preprint.pdf
Fits a generalized linear model with a link-linear model for the dispersion as well as for the mean.
dglm(formula=formula(data), dformula = ~ 1, family = gaussian, dlink = "log", data = parent.frame(), subset = NULL, weights = NULL, contrasts = NULL, method = "ml", mustart = NULL, betastart = NULL, etastart = NULL, phistart = NULL, control = dglm.control(...), ykeep = TRUE, xkeep = FALSE, zkeep = FALSE, ...) dglm.constant(y, family, weights = 1)
dglm(formula=formula(data), dformula = ~ 1, family = gaussian, dlink = "log", data = parent.frame(), subset = NULL, weights = NULL, contrasts = NULL, method = "ml", mustart = NULL, betastart = NULL, etastart = NULL, phistart = NULL, control = dglm.control(...), ykeep = TRUE, xkeep = FALSE, zkeep = FALSE, ...) dglm.constant(y, family, weights = 1)
formula |
a symbolic description of the model to be fit.
The details of model specification are found in |
dformula |
a formula expression of the form
|
family |
a description of the error distribution and link function to
be used in the model.
See |
dlink |
link function for modelling the dispersion.
Any link function accepted by the |
data |
an optional data frame containing the variables in the model.
See |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process. |
contrasts |
an optional list. See the |
method |
the method used to estimate the dispersion parameters;
the default is |
mustart |
numeric vector giving starting values for the fitted values
or expected responses.
Must be of the same length as the response,
or of length 1 if a constant starting vector is desired.
Ignored if |
betastart |
numeric vector giving starting values for the regression coefficients in the link-linear model for the mean. |
etastart |
numeric vector giving starting values for the linear predictor for the mean model. |
phistart |
numeric vector giving starting values for the dispersion parameters. |
control |
a list of iteration and algorithmic constants.
See |
ykeep |
logical flag: if |
xkeep |
logical flag: if |
zkeep |
logical flag: if |
... |
further arguments passed to or from other methods. |
y |
numeric response vector |
Write for the expectation of the
th response.
Then
where
is the variance function and
is the dispersion of the
th response
(often denoted as the Greek character ‘phi’).
We assume the link linear models
and
,
where
and
are vectors of covariates,
and
and
are vectors of regression
cofficients affecting the mean and dispersion respectively.
The argument
dlink
specifies .
See
family
for how to specify .
The optional arguments
mustart
, betastart
and phistart
specify starting values for ,
and
respectively.
The parameters are estimated as for an ordinary glm.
The parameters
are estimated by way of a dual glm
in which the deviance components of the ordinary glm appear as responses.
The estimation procedure alternates between one iteration for the mean submodel
and one iteration for the dispersion submodel until overall convergence.
The output from dglm
, out
say, consists of two glm
objects
(that for the dispersion submodel is out$dispersion.fit
) with a few more
components for the outer iteration and overall likelihood.
The summary
and anova
functions have special methods for dglm
objects.
Any generic function that has methods for glm
s or lm
s will work on
out
, giving information about the mean submodel.
Information about the dispersion submodel can be obtained by using
out$dispersion.fit
as argument rather than out itself.
In particular drop1(out,scale=1)
gives correct score statistics for
removing terms from the mean submodel,
while drop1(out$dispersion.fit,scale=2)
gives correct score
statistics for removing terms from the dispersion submodel.
The dispersion submodel is treated as a gamma family unless the original
reponses are gamma, in which case the dispersion submodel is digamma.
This is exact if the original glm family is gaussian
,
Gamma
or inverse.gaussian
. In other cases it can be
justified by the saddle-point approximation to the density of the responses.
The results will therefore be close to exact ML or REML when the dispersions
are small compared to the means. In all cases the dispersion submodel has prior
weights 1, and has its own dispersion parameter which is 2.
an object of class dglm
is returned,
which inherits from glm
and lm
.
See dglm-class
for details.
The anova method is questionable when applied to an dglm
object with
method="reml"
(stick to method="ml"
).
Gordon Smyth, ported to R by Peter Dunn
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60. doi:10.1111/j.2517-6161.1989.tb01747.x
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709. doi:10.1002/(SICI)1099-095X(199911/12)10:6<695::AID-ENV385>3.0.CO;2-M https://gksmyth.github.io/pubs/Ties98-Preprint.pdf
Smyth, G. K., and Verbyla, A. P. (1999). Double generalized linear models: approximate REML and diagnostics. In Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling, Graz, Austria, July 19-23, 1999, H. Friedl, A. Berghold, G. Kauermann (eds.), Technical University, Graz, Austria, pages 66-80. https://gksmyth.github.io/pubs/iwsm99-Preprint.pdf
dglm-class
, dglm.control
,
Digamma family
, Polygamma
.
See https://gksmyth.github.io/s/dglm.html for the original S-Plus code.
# Continuing the example from glm, but this time try # fitting a Gamma double generalized linear model also. clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) # The same example as in glm: the dispersion is modelled as constant # However, dglm uses ml not reml, so the results are slightly different: out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma) summary(out) # Try a double glm out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma) summary(out2) anova(out2) # Summarize the mean model as for a glm summary.glm(out2) # Summarize the dispersion model as for a glm summary(out2$dispersion.fit) # Examine goodness of fit of dispersion model by plotting residuals plot(fitted(out2$dispersion.fit),residuals(out2$dispersion.fit))
# Continuing the example from glm, but this time try # fitting a Gamma double generalized linear model also. clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) # The same example as in glm: the dispersion is modelled as constant # However, dglm uses ml not reml, so the results are slightly different: out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma) summary(out) # Try a double glm out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma) summary(out2) anova(out2) # Summarize the mean model as for a glm summary.glm(out2) # Summarize the dispersion model as for a glm summary(out2$dispersion.fit) # Examine goodness of fit of dispersion model by plotting residuals plot(fitted(out2$dispersion.fit),residuals(out2$dispersion.fit))
Class of objects returned by fitting double generalized linear models.
Write for the expectation of the
th response.
Then
where
is the variance function and
is the dispersion of the
th response
(often denoted as the Greek character ‘phi’).
We assume the link linear models
and
,
where
and
are vectors of covariates,
and
and
are vectors of regression
cofficients affecting the mean and dispersion respectively.
The argument
dlink
specifies .
See
family
for how to specify .
The optional arguments
mustart
, betastart
and phistart
specify starting values for ,
and
respectively.
The parameters are estimated as for an ordinary glm.
The parameters
are estimated by way of a dual glm
in which the deviance components of the ordinary glm appear as responses.
The estimation procedure alternates between one iteration for the mean submodel
and one iteration for the dispersion submodel until overall convergence.
The output from dglm
, out
say, consists of two glm
objects
(that for the dispersion submodel is out$dispersion.fit
) with a few more
components for the outer iteration and overall likelihood.
The summary
and anova
functions have special methods for dglm
objects.
Any generic function that has methods for glm
s or lm
s will work on
out
, giving information about the mean submodel.
Information about the dispersion submodel can be obtained by using
out$dispersion.fit
as argument rather than out itself.
In particular drop1(out,scale=1)
gives correct score statistics for
removing terms from the mean submodel,
while drop1(out$dispersion.fit,scale=2)
gives correct score
statistics for removing terms from the dispersion submodel.
The dispersion submodel is treated as a gamma family unless the original
reponses are gamma, in which case the dispersion submodel is digamma.
This is exact if the original glm family is gaussian
,
Gamma
or inverse.gaussian
. In other cases it can be
justified by the saddle-point approximation to the density of the responses.
The results will therefore be close to exact ML or REML when the dispersions
are small compared to the means. In all cases the dispersion submodel has prior
weights 1, and has its own dispersion parameter which is 2.
This class of objects is returned by the dglm
function
to represent a fitted double generalized linear model.
Class "dglm"
inherits from class "glm"
,
since it consists of two coupled generalized linear models,
one for the mean and one for the dispersion.
Like glm
,
it also inherits from lm
.
The object returned has all the components of a glm
object.
The returned component object$dispersion.fit
is also a
glm
object in its own right,
representing the result of modelling the dispersion.
Objects of this class have methods for the functions
print
, plot
, summary
, anova
, predict
,
fitted
, drop1
, add1
, and step
, amongst others.
Specific methods (not shared with glm
) exist for
summary
and anova
.
A dglm
object consists of a glm
object with the following additional components:
dispersion.fit |
the dispersion submodel: a glm object
representing the fitted model for the dispersions.
The responses for this model are the deviance components from the original
generalized linear model.
The prior weights are 1 and the dispersion or scale of this model is 2. |
iter |
this component now represents the number of outer iterations used to fit the coupled mean-dispersion models. At each outer iteration, one IRLS is done for each of the mean and dispersion submodels. |
method |
fitting method used: "ml" if maximum likelihood
was used or "reml" if adjusted profile likelihood was used. |
m2loglik |
minus twice the log-likelihood or adjusted profile likelihood of the fitted model. |
The anova method is questionable when applied to an dglm
object with
method="reml"
(stick to method="ml"
).
Gordon Smyth, ported to R by Peter Dunn ([email protected])
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60. doi:10.1111/j.2517-6161.1989.tb01747.x
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709. doi:10.1002/(SICI)1099-095X(199911/12)10:6<695::AID-ENV385>3.0.CO;2-M https://gksmyth.github.io/pubs/Ties98-Preprint.pdf
Smyth, G. K., and Verbyla, A. P. (1999). Double generalized linear models: approximate REML and diagnostics. In Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling, Graz, Austria, July 19-23, 1999, H. Friedl, A. Berghold, G. Kauermann (eds.), Technical University, Graz, Austria, pages 66-80. https://gksmyth.github.io/pubs/iwsm99-Preprint.pdf
dglm
, Digamma family
, Polygamma
Auxiliary function as user interface for fitting double
generalized linear models.
Typically only used when calling dglm
.
dglm.control(epsilon = 1e-007, maxit = 50, trace = FALSE, ...)
dglm.control(epsilon = 1e-007, maxit = 50, trace = FALSE, ...)
epsilon |
positive convergence tolerance epsilon; the iterations
converge when
|
maxit |
integer giving the maximal number of outer iterations of the alternating iterations. |
trace |
logical indicating if (a small amount of) output should be produced for each iteration. |
... |
not currently implemented |
When 'trace' is true, calls to 'cat' produce the output for each
outer iteration. Hence, 'options(digits = *)' can be used to
increase the precision; see the example for glm.control
.
Gordon Smyth, ported to R by Peter Dunn ([email protected])
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60.
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709.
Verbyla, A. P., and Smyth, G. K. (1998). Double generalized linear models: approximate residual maximum likelihood and diagnostics. Research Report, Department of Statistics, University of Adelaide.
### A variation on example(dglm) : # Continuing the example from glm, but this time try # fitting a Gamma double generalized linear model also. clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) # The same example as in glm: the dispersion is modelled as constant out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma) summary(out) # Try a double glm oo <- options() options(digits=12) # See more details in tracing out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma, control=dglm.control(epsilon=0.01, trace=TRUE)) # With this value of epsilon, convergence should be quicker # and the results less reliable (compare to example(dglm) ) summary(out2) options(oo)
### A variation on example(dglm) : # Continuing the example from glm, but this time try # fitting a Gamma double generalized linear model also. clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) # The same example as in glm: the dispersion is modelled as constant out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma) summary(out) # Try a double glm oo <- options() options(digits=12) # See more details in tracing out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma, control=dglm.control(epsilon=0.01, trace=TRUE)) # With this value of epsilon, convergence should be quicker # and the results less reliable (compare to example(dglm) ) summary(out2) options(oo)
This implements the 'residuals' generic for the dglm object
## S3 method for class 'dglm' residuals(object, ...)
## S3 method for class 'dglm' residuals(object, ...)
object |
an object of class |
... |
any other parameters are passed to |
Numeric vector of residuals from the mean submodel.
Robert W. Corty and Gordon Smyth
Summarize objects of class "dglm"
.
## S3 method for class 'dglm' summary(object, dispersion=NULL, correlation = FALSE, ...)
## S3 method for class 'dglm' summary(object, dispersion=NULL, correlation = FALSE, ...)
object |
an object of class |
dispersion |
the dispersion parameter for the fitting family.
By default it is obtained from |
correlation |
logical; if |
... |
further arguments to be passed to |
For more details, see summary.glm
.
If more than one of etastart
, start
and mustart
is specified, the first in the list will be used.
An object of class "summary.dglm"
, which is a list with the following components:
call |
the component from |
terms |
the component from |
family |
the component from |
deviance |
the component from |
aic |
|
constrasts |
(where relevant) the contrasts used. NOT WORKING?? |
df.residual |
the component from |
null.deviance |
the component from |
df.null |
the residual degrees of freedom for the null model. |
iter |
the component from |
deviance.resid |
the deviance residuals: see |
coefficients |
the matrix of coefficients, standard errors,
|
aliased |
named logical vector showing if the original coefficients are aliased. |
dispersion |
either the supplied argument or the estimated dispersion
if the latter in |
df |
a 3-vector of the rank of the model and the number of residual degrees of freedom, plus number of non-aliased coefficients. |
cov.unscaled |
the unscaled ( |
cov.scaled |
ditto, scaled by |
correlation |
(only if |
dispersion.summary |
the summary of the fitted dispersion model |
outer.iter |
the number of outer iteration of the alternating iterations |
m2loglik |
minus twice the log-likelihood of the fitted model |
The anova method is questionable when applied to an dglm
object created with
method="reml"
(stick to method="ml"
).
Gordon Smyth, ported to R by Peter Dunn ([email protected])
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60. doi:10.1111/j.2517-6161.1989.tb01747.x
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709. doi:10.1002/(SICI)1099-095X(199911/12)10:6<695::AID-ENV385>3.0.CO;2-M https://gksmyth.github.io/pubs/Ties98-Preprint.pdf
Smyth, G. K., and Verbyla, A. P. (1999). Double generalized linear models: approximate REML and diagnostics. In Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling, Graz, Austria, July 19-23, 1999, H. Friedl, A. Berghold, G. Kauermann (eds.), Technical University, Graz, Austria, pages 66-80. https://gksmyth.github.io/pubs/iwsm99-Preprint.pdf