Generate correlated data (predictors) for one unit
genX.RdThis is used to generate data for one unit. It is recently re-designed to serve as a building block in a multi-level data simulation exercise. The new arguments "unit" and "idx" can be set as NULL to remove the multi-level unit and row naming features. This function uses the rockchalk::mvrnorm function, but introduces a convenience layer by allowing users to supply standard deviations and the correlation matrix rather than the variance.
Usage
genX(
N,
means,
sds,
rho,
Sigma = NULL,
intercept = TRUE,
col.names = NULL,
unit = NULL,
idx = FALSE
)Arguments
- N
Number of cases desired
- means
A vector of means for p variables. It is optional to name them. This implicitly sets the dimension of the predictor matrix as N x p. If no names are supplied, the automatic variable names will be "x1", "x2", and so forth. If means is named, such as c("myx1" = 7, "myx2" = 13, "myx3" = 44), those names will be come column names in the output matrix.
- sds
Standard deviations for the variables. If less than p values are supplied, they will be recycled.
- rho
Correlation coefficient for p variables. Several input formats are allowed (see
lazyCor). This can be a single number (common correlation among all variables), a full matrix of correlations among all variables, or a vector that is interpreted as the strictly lower triangle (a vech).- Sigma
P x P variance/covariance matrix.
- intercept
Default = TRUE, do you want a first column filled with 1?
- col.names
Names supplied here will override column names supplied with the means parameter. If no names are supplied with means, or here, we will name variables x1, x2, x3, ... xp, with Intercept at front of list if intercept = TRUE.
- unit
A character string for the name of the unit being simulated. Might be referred to as a "group" or "district" or "level 2" membership indicator.
- idx
If set TRUE, a column "idx" is added, numbering the rows from 1:N. If the argument unit is not NULL, then idx is set to TRUE, but that behavior can be overridded by setting idx = FALSE.
Value
A data frame with rownames to specify unit and individual values, including an attribute "unit" with the unit's name.
Details
Today I've decided to make the return object a data frame. This allows the possibility of including a character variable "unit" within the result. For multi-level models, that will help. If unit is not NULL, its value will be added as a column in the data frame. If unit is not null, the rownames will be constructed by pasting "unit" name and idx. If unit is not null, then idx will be included as another column, unless the user explicitly sets idx = FALSE.
Author
Paul Johnson pauljohn@ku.edu
Examples
X1 <- genX(10, means = c(7, 8), sds = 3, rho = .4)
X2 <- genX(10, means = c(7, 8), sds = 3, rho = .4, unit = "Kansas")
head(X2)
#> Intercept x1 x2 unit idx
#> Kansas_1 1 5.247658 3.193354 Kansas 1
#> Kansas_2 1 2.266609 7.569167 Kansas 2
#> Kansas_3 1 3.103034 5.953705 Kansas 3
#> Kansas_4 1 7.279783 9.148160 Kansas 4
#> Kansas_5 1 5.735378 13.077232 Kansas 5
#> Kansas_6 1 7.426782 2.688957 Kansas 6
X3 <- genX(10, means = c(7, 8), sds = 3, rho = .4, idx = FALSE, unit = "Iowa")
head(X3)
#> Intercept x1 x2 unit
#> Iowa_1 1 5.9195522 10.481373 Iowa
#> Iowa_2 1 4.5889921 9.206170 Iowa
#> Iowa_3 1 5.5154397 6.670921 Iowa
#> Iowa_4 1 6.3777723 8.586272 Iowa
#> Iowa_5 1 -0.7553535 3.447440 Iowa
#> Iowa_6 1 8.7480249 15.494415 Iowa
X4 <- genX(10, means = c("A" = 7, "B" = 8), sds = c(3), rho = .4)
head(X4)
#> Intercept A B
#> 1 1 7.289502 7.635676
#> 2 1 5.914488 8.206094
#> 3 1 9.613455 10.969463
#> 4 1 8.701256 9.423240
#> 5 1 4.524235 7.946125
#> 6 1 9.611468 10.974642
X5 <- genX(10, means = c(7, 3, 7, 5), sds = c(3, 6),
rho = .5, col.names = c("Fred", "Sally", "Henry", "Barbi"))
head(X5)
#> Intercept Fred Sally Henry Barbi
#> 1 1 0.3500862 -9.240437 5.108125 -12.9928656
#> 2 1 9.5570578 5.238086 11.350593 9.0728925
#> 3 1 3.3137280 -10.010462 6.609930 -9.8478502
#> 4 1 1.2043423 -4.664308 4.410985 -6.5353145
#> 5 1 8.1651683 1.979484 5.386327 17.1623800
#> 6 1 1.6907264 -6.585578 1.323613 -0.2586328
Sigma <- lazyCov(Rho = c(.2, .3, .4, .5, .2, .1), Sd = c(2, 3, 1, 4))
X6 <- genX(10, means = c(5, 2, -19, 33), Sigma = Sigma, unit = "Winslow_AZ")
head(X6)
#> Intercept x1 x2 x3 x4 unit idx
#> Winslow_AZ_1 1 2.974632 -2.32074981 -19.85358 24.74796 Winslow_AZ 1
#> Winslow_AZ_2 1 8.700069 0.05243003 -18.47866 41.66165 Winslow_AZ 2
#> Winslow_AZ_3 1 3.986343 4.64275755 -19.82432 36.01817 Winslow_AZ 3
#> Winslow_AZ_4 1 5.054896 4.66060469 -17.56117 31.47403 Winslow_AZ 4
#> Winslow_AZ_5 1 8.283404 2.46749532 -18.84745 34.74754 Winslow_AZ 5
#> Winslow_AZ_6 1 4.348368 3.15879133 -17.44041 27.87164 Winslow_AZ 6