Constrained Quadratic Ordination
rcqo.RdRandom generation for constrained quadratic ordination (CQO).
Usage
rcqo(n, p, S, Rank = 1,
family = c("poisson", "negbinomial", "binomial-poisson",
"Binomial-negbinomial", "ordinal-poisson",
"Ordinal-negbinomial", "gamma2"),
eq.maximums = FALSE, eq.tolerances = TRUE, es.optimums = FALSE,
lo.abundance = if (eq.maximums) hi.abundance else 10,
hi.abundance = 100, sd.latvar = head(1.5/2^(0:3), Rank),
sd.optimums = ifelse(es.optimums, 1.5/Rank, 1) *
ifelse(scale.latvar, sd.latvar, 1),
sd.tolerances = 0.25, Kvector = 1, Shape = 1,
sqrt.arg = FALSE, log.arg = FALSE, rhox = 0.5, breaks = 4,
seed = NULL, optimums1.arg = NULL, Crow1positive = TRUE,
xmat = NULL, scale.latvar = TRUE)Arguments
- n
Number of sites. It is denoted by \(n\) below.
- p
Number of environmental variables, including an intercept term. It is denoted by \(p\) below. Must be no less than \(1+R\) in value.
- S
Number of species. It is denoted by \(S\) below.
- Rank
The rank or the number of latent variables or true dimension of the data on the reduced space. This must be either 1, 2, 3 or 4. It is denoted by \(R\).
- family
What type of species data is to be returned. The first choice is the default. If binomial then a 0 means absence and 1 means presence. If ordinal then the
breaksargument is passed into thebreaksargument ofcut. Note that either the Poisson or negative binomial distributions are used to generate binomial and ordinal data, and that an upper-case choice is used for the negative binomial distribution (this makes it easier for the user). If"gamma2"then this is the 2-parameter gamma distribution.- eq.maximums
Logical. Does each species have the same maximum? See arguments
lo.abundanceandhi.abundance.- eq.tolerances
Logical. Does each species have the same tolerance? If
TRUEthen the common value is 1 along every latent variable, i.e., all species' tolerance matrices are the order-\(R\) identity matrix.- es.optimums
Logical. Do the species have equally spaced optimums? If
TRUEthen the quantity \(S^{1/R}\) must be an integer with value 2 or more. That is, there has to be an appropriate number of species in total. This is so that a grid of optimum values is possible in \(R\)-dimensional latent variable space in order to place the species' optimums. Also see the argumentsd.tolerances.- lo.abundance, hi.abundance
Numeric. These are recycled to a vector of length \(S\). The species have a maximum between
lo.abundanceandhi.abundance. That is, at their optimal environment, the mean abundance of each species is between the two componentwise values. Ifeq.maximumsisTRUEthenlo.abundanceandhi.abundancemust have the same values. Ifeq.maximumsisFALSEthen the logarithm of the maximums are uniformly distributed betweenlog(lo.abundance)andlog(hi.abundance).- sd.latvar
Numeric, of length \(R\) (recycled if necessary). Site scores along each latent variable have these standard deviation values. This must be a decreasing sequence of values because the first ordination axis contains the greatest spread of the species' site scores, followed by the second axis, followed by the third axis, etc.
- sd.optimums
Numeric, of length \(R\) (recycled if necessary). If
es.optimums = FALSEthen, for the \(r\)th latent variable axis, the optimums of the species are generated from a normal distribution centered about 0. Ifes.optimums = TRUEthen the \(S\) optimums are equally spaced about 0 along every latent variable axis. Regardless of the value ofes.optimums, the optimums are then scaled to give standard deviationsd.optimums[r].- sd.tolerances
Logical. If
eq.tolerances = FALSEthen, for the \(r\)th latent variable, the species' tolerances are chosen from a normal distribution with mean 1 and standard deviationsd.tolerances[r]. However, the first speciesy1has its tolerance matrix set equal to the order-\(R\) identity matrix. All tolerance matrices for all species are diagonal in this function. This argument is ignored ifeq.tolerancesisTRUE, otherwise it is recycled to length \(R\) if necessary.- Kvector
A vector of positive \(k\) values (recycled to length \(S\) if necessary) for the negative binomial distribution (see
negbinomialfor details). Note that a natural default value does not exist, however the default value here is probably a realistic one, and that for large values of \(\mu\) one has \(Var(Y)=\mu^2/k\) approximately.- Shape
A vector of positive \(\lambda\) values (recycled to length \(S\) if necessary) for the 2-parameter gamma distribution (see
gamma2for details). Note that a natural default value does not exist, however the default value here is probably a realistic one, and that \(Var(Y) = \mu^2 / \lambda\).- sqrt.arg
Logical. Take the square-root of the negative binomial counts? Assigning
sqrt.arg = TRUEwhenfamily="negbinomial"means that the resulting species data can be considered very crudely to be approximately Poisson distributed. They will not integers in general but much easier (less numerical problems) to estimate using something likecqo(..., family="poissonff").- log.arg
Logical. Take the logarithm of the gamma random variates? Assigning
log.arg = TRUEwhenfamily="gamma2"means that the resulting species data can be considered very crudely to be approximately Gaussian distributed about its (quadratic) mean.- rhox
Numeric, less than 1 in absolute value. The correlation between the environmental variables. The correlation matrix is a matrix of 1's along the diagonal and
rhoxin the off-diagonals. Note that each environmental variable is normally distributed with mean 0. The standard deviation of each environmental variable is chosen so that the site scores have the determined standard deviation, as given by argumentsd.latvar.- breaks
If
familyis assigned an ordinal value then this argument is used to define the cutpoints. It is fed into thebreaksargument ofcut.- seed
If given, it is passed into
set.seed. This argument can be used to obtain reproducible results. If set, the value is saved as the"seed"attribute of the returned value. The default will not change the random generator state, and return.Random.seedas"seed"attribute.- optimums1.arg
If assigned and
Rank = 1then these are the explicity optimums. Recycled to lengthS.- Crow1positive
See
qrrvglm.controlfor details.- xmat
The \(n\) by \(p-1\) environmental matrix can be inputted.
- scale.latvar
Logical. If
FALSEthe argumentsd.latvaris ignored and no scaling of the latent variable values is performed.
Details
This function generates data coming from a
constrained quadratic
ordination (CQO) model. In particular,
data coming from a species packing model
can be generated
with this function.
The species packing model states that species have equal
tolerances, equal maximums, and optimums which are uniformly
distributed over the latent variable space. This can be
achieved by assigning the arguments es.optimums = TRUE,
eq.maximums = TRUE, eq.tolerances = TRUE.
At present, the Poisson and negative binomial abundances
are generated first using lo.abundance and
hi.abundance, and if family is binomial or ordinal
then it is converted into these forms.
In CQO theory the \(n\) by \(p\) matrix \(X\) is partitioned into two parts \(X_1\) and \(X_2\). The matrix \(X_2\) contains the `real' environmental variables whereas the variables in \(X_1\) are just for adjustment purposes; they contain the intercept terms and other variables that one wants to adjust for when (primarily) looking at the variables in \(X_2\). This function has \(X_1\) only being a matrix of ones, i.e., containing an intercept only.
Value
A \(n\) by \(p-1+S\) data frame with components and attributes. In the following the attributes are labelled with double quotes.
- x2, x3, x4, ..., xp
The environmental variables. This makes up the \(n\) by \(p-1\) \(X_2\) matrix. Note that
x1is not present; it is effectively a vector of ones since it corresponds to an intercept term whencqois applied to the data.- y1, y2, x3, ..., yS
The species data. This makes up the \(n\) by \(S\) matrix \(Y\). This will be of the form described by the argument
family.- "concoefficients"
The \(p-1\) by \(R\) matrix of constrained coefficients (or canonical coefficients). These are also known as weights or loadings.
- "formula"
The formula involving the species and environmental variable names. This can be used directly in the
formulaargument ofcqo.- "log.maximums"
The \(S\)-vector of species' maximums, on a log scale. These are uniformly distributed between
log(lo.abundance)andlog(hi.abundance).- "latvar"
The \(n\) by \(R\) matrix of site scores. Each successive column (latent variable) has sample standard deviation equal to successive values of
sd.latvar.- "eta"
The linear/additive predictor value.
- "optimums"
The \(S\) by \(R\) matrix of species' optimums.
- "tolerances"
The \(S\) by \(R\) matrix of species' tolerances. These are the square root of the diagonal elements of the tolerance matrices (recall that all tolerance matrices are restricted to being diagonal in this function).
Other attributes are "break",
"family", "Rank",
"lo.abundance", "hi.abundance",
"eq.tolerances", "eq.maximums",
"seed" as used.
References
Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Yee, T. W. (2006). Constrained additive ordination. Ecology, 87, 203–213.
ter Braak, C. J. F. and Prentice, I. C. (1988). A theory of gradient analysis. Advances in Ecological Research, 18, 271–317.
Note
This function is under development and is not finished yet. There may be a few bugs.
Yet to do: add an argument that allows absences to be equal to the first level if ordinal data is requested.
Examples
if (FALSE) { # \dontrun{
# Example 1: Species packing model:
n <- 100; p <- 5; S <- 5
mydata <- rcqo(n, p, S, es.opt = TRUE, eq.max = TRUE)
names(mydata)
(myform <- attr(mydata, "formula"))
fit <- cqo(myform, poissonff, mydata, Bestof = 3) # eq.tol = TRUE
matplot(attr(mydata, "latvar"), mydata[,-(1:(p-1))], col = 1:S)
persp(fit, col = 1:S, add = TRUE)
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S) # Same plot as above
# Compare the fitted model with the 'truth'
concoef(fit) # The fitted model
attr(mydata, "concoefficients") # The 'truth'
c(apply(attr(mydata, "latvar"), 2, sd),
apply(latvar(fit), 2, sd)) # Both values should be approx equal
# Example 2: negative binomial data fitted using a Poisson model:
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "negbin", sqrt = TRUE)
myform <- attr(mydata, "formula")
fit <- cqo(myform, fam = poissonff, dat = mydata) # I.tol = TRUE,
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)
# Compare the fitted model with the 'truth'
concoef(fit) # The fitted model
attr(mydata, "concoefficients") # The 'truth'
} # }