Select Variables for a Formula Response or the RHS of a Formula
Select.RdSelect variables from a data frame whose names begin with a certain character string.
Usage
Select(data = list(), prefix = "y",
lhs = NULL, rhs = NULL, rhs2 = NULL, rhs3 = NULL,
as.character = FALSE, as.formula.arg = FALSE, tilde = TRUE,
exclude = NULL, sort.arg = TRUE)Arguments
- data
A data frame or a matrix.
- prefix
A vector of character strings, or a logical. If a character then the variables chosen from
databegin with the value ofprefix. If a logical then onlyTRUEis accepted and all the variables indataare chosen.- lhs
A character string. The response of a formula.
- rhs
A character string. Included as part of the RHS a formula. Set
rhs = "0"to suppress the intercept.- rhs2, rhs3
Same as
rhsbut appended to its RHS, i.e.,paste0(rhs, " + ", rhs2, " + ", rhs3). If used,rhsshould be used first, and then possiblyrhs2and then possiblyrhs3.- as.character
Logical. Return the answer as a character string?
- as.formula.arg
Logical. Is the answer a formula?
- tilde
Logical. If
as.characterandas.formula.argare bothTRUEthen include the tilde in the formula?- exclude
Vector of character strings. Exclude these variables explicitly.
- sort.arg
Logical. Sort the variables?
Details
This is meant as a utility function to avoid manually:
(i) making a cbind call to construct
a big matrix response,
and
(ii) constructing a formula involving a lot of terms.
The savings can be made because the variables of interest
begin with some prefix, e.g., with the character "y".
Value
If as.character = FALSE and
as.formula.arg = FALSE then a matrix such
as cbind(y1, y2, y3).
If as.character = TRUE and
as.formula.arg = FALSE then a character string such
as "cbind(y1, y2, y3)".
If as.character = FALSE and
as.formula.arg = TRUE then a formula such
as lhs ~ y1 + y2 + y3.
If as.character = TRUE and
as.formula.arg = TRUE then a character string such
as "lhs ~ y1 + y2 + y3".
See the examples below.
By default, if no variables beginning the the value of prefix
is found then a NULL is returned.
Setting prefix = " " is a way of selecting no variables.
Note
This function is a bit experimental at this stage and
may change in the short future.
Some of its utility may be better achieved using
subset and its select argument,
e.g., subset(pdata, TRUE, select = y01:y10).
For some models such as posbernoulli.t the
order of the variables in the xij argument is
crucial, therefore care must be taken with the
argument sort.arg.
In some instances, it may be good to rename variables
y1 to y01,
y2 to y02, etc.
when there are variables such as
y14.
Currently subsetcol() and Select() are identical.
One of these functions might be withdrawn in the future.
Examples
Pneumo <- pneumo
colnames(Pneumo) <- c("y1", "y2", "y3", "x2") # The "y" variables are response
Pneumo$x1 <- 1; Pneumo$x3 <- 3; Pneumo$x <- 0; Pneumo$x4 <- 4 # Add these
Select(data = Pneumo) # Same as with(Pneumo, cbind(y1, y2, y3))
#> y1 y2 y3
#> 1 5.8 98 0
#> 2 15.0 51 2
#> 3 21.5 34 6
#> 4 27.5 35 5
#> 5 33.5 32 10
#> 6 39.5 23 7
#> 7 46.0 12 6
#> 8 51.5 4 2
Select(Pneumo, "x")
#> x x1 x2 x3 x4
#> 1 0 1 0 3 4
#> 2 0 1 1 3 4
#> 3 0 1 3 3 4
#> 4 0 1 8 3 4
#> 5 0 1 9 3 4
#> 6 0 1 8 3 4
#> 7 0 1 10 3 4
#> 8 0 1 5 3 4
Select(Pneumo, "x", sort = FALSE, as.char = TRUE)
#> [1] "cbind(x2, x1, x3, x, x4)"
Select(Pneumo, "x", exclude = "x1")
#> x x2 x3 x4
#> 1 0 0 3 4
#> 2 0 1 3 4
#> 3 0 3 3 4
#> 4 0 8 3 4
#> 5 0 9 3 4
#> 6 0 8 3 4
#> 7 0 10 3 4
#> 8 0 5 3 4
Select(Pneumo, "x", exclude = "x1", as.char = TRUE)
#> [1] "cbind(x, x2, x3, x4)"
Select(Pneumo, c("x", "y"))
#> x x1 x2 x3 x4 y1 y2 y3
#> 1 0 1 0 3 4 5.8 98 0
#> 2 0 1 1 3 4 15.0 51 2
#> 3 0 1 3 3 4 21.5 34 6
#> 4 0 1 8 3 4 27.5 35 5
#> 5 0 1 9 3 4 33.5 32 10
#> 6 0 1 8 3 4 39.5 23 7
#> 7 0 1 10 3 4 46.0 12 6
#> 8 0 1 5 3 4 51.5 4 2
Select(Pneumo, "z") # Now returns a NULL
#> NULL
Select(Pneumo, " ") # Now returns a NULL
#> NULL
Select(Pneumo, prefix = TRUE, as.formula = TRUE)
#> ~x + x1 + x2 + x3 + x4 + y1 + y2 + y3
#> <environment: 0x6141f6ec98b8>
Select(Pneumo, "x", exclude = c("x3", "x1"), as.formula = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
#> cbind(y1, y2, y3) ~ x + x2 + x4 + 0
#> <environment: 0x6141f6f03800>
Select(Pneumo, "x", exclude = "x1", as.formula = TRUE, as.char = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
#> [1] "cbind(y1, y2, y3) ~ x + x2 + x3 + x4 + 0"
# Now a 'real' example:
Huggins89table1 <- transform(Huggins89table1, x3.tij = t01)
tab1 <- subset(Huggins89table1,
rowSums(Select(Huggins89table1, "y")) > 0)
# Same as
# subset(Huggins89table1, y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10 > 0)
# Long way to do it:
fit.th <-
vglm(cbind(y01, y02, y03, y04, y05, y06, y07, y08, y09, y10) ~ x2 + x3.tij,
xij = list(x3.tij ~ t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10 - 1),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = ~ x2 + x3.tij + t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10)
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782
# Short way to do it:
Fit.th <- vglm(Select(tab1, "y") ~ x2 + x3.tij,
xij = list(Select(tab1, "t", as.formula = TRUE,
sort = FALSE, lhs = "x3.tij", rhs = "0")),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = Select(tab1, prefix = TRUE, as.formula = TRUE))
#> Iteration 1: loglikelihood = -97.120355
#> Iteration 2: loglikelihood = -97.079804
#> Iteration 3: loglikelihood = -97.079782
#> Iteration 4: loglikelihood = -97.079782