Title: | Structure for Organizing Monte Carlo Simulation Designs |
---|---|
Description: | Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>. |
Authors: | Phil Chalmers [aut, cre] , Matthew Sigal [ctb], Ogreden Oguzhan [ctb], Mikko Ronkko [ctb] |
Maintainer: | Phil Chalmers <[email protected]> |
License: | GPL (>=2) |
Version: | 2.17.2 |
Built: | 2024-11-14 23:23:36 UTC |
Source: | https://github.com/philchalmers/simdesign |
Given an input vector, replace elements of this vector with missing values according to some scheme.
Default method replaces input values with a MCAR scheme (where on average 10% of the values will be
replaced with NA
s). MAR and MNAR are supported by replacing the default FUN
argument.
addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)
addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)
y |
an input vector that should contain missing data in the form of |
fun |
a user defined function indicating the missing data mechanism for each element in |
... |
additional arguments to be passed to |
Given an input vector y, and other relevant variables inside (X) and outside (Z) the data-set, the three types of missingness are:
Missing completely at random (MCAR). This is realized by randomly sampling the values of the input vector (y) irrespective of the possible values in X and Z. Therefore missing values are randomly sampled and do not depend on any data characteristics and are truly random
Missing at random (MAR). This is realized when values in the dataset (X)
predict the missing data mechanism in y; conceptually this is equivalent to
. This requires the user to define a custom missing data function
Missing not at random (MNAR). This is similar to MAR except
that the missing mechanism comes
from the value of y itself or from variables outside the working dataset;
conceptually this is equivalent to . This requires
the user to define a custom missing data function
the input vector y
with the sampled NA
values
(according to the FUN
scheme)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: set.seed(1) y <- rnorm(1000) ## 10% missing rate with default FUN head(ymiss <- addMissing(y), 10) ## 50% missing with default FUN head(ymiss <- addMissing(y, rate = .5), 10) ## missing values only when female and low X <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE), level = sample(c('high', 'low'), 1000, replace=TRUE)) head(X) fun <- function(y, X, ...){ p <- rep(0, length(y)) p[X$group == 'female' & X$level == 'low'] <- .2 p } ymiss <- addMissing(y, X, fun=fun) tail(cbind(ymiss, X), 10) ## missingness as a function of elements in X (i.e., a type of MAR) fun <- function(y, X){ # missingness with a logistic regression approach df <- data.frame(y, X) mm <- model.matrix(y ~ group + level, df) cfs <- c(-5, 2, 3) #intercept, group, and level coefs z <- cfs %*% t(mm) plogis(z) } ymiss <- addMissing(y, X, fun=fun) tail(cbind(ymiss, X), 10) ## missing values when y elements are large (i.e., a type of MNAR) fun <- function(y) ifelse(abs(y) > 1, .4, 0) ymiss <- addMissing(y, fun=fun) tail(cbind(y, ymiss), 10) ## End(Not run)
## Not run: set.seed(1) y <- rnorm(1000) ## 10% missing rate with default FUN head(ymiss <- addMissing(y), 10) ## 50% missing with default FUN head(ymiss <- addMissing(y, rate = .5), 10) ## missing values only when female and low X <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE), level = sample(c('high', 'low'), 1000, replace=TRUE)) head(X) fun <- function(y, X, ...){ p <- rep(0, length(y)) p[X$group == 'female' & X$level == 'low'] <- .2 p } ymiss <- addMissing(y, X, fun=fun) tail(cbind(ymiss, X), 10) ## missingness as a function of elements in X (i.e., a type of MAR) fun <- function(y, X){ # missingness with a logistic regression approach df <- data.frame(y, X) mm <- model.matrix(y ~ group + level, df) cfs <- c(-5, 2, 3) #intercept, group, and level coefs z <- cfs %*% t(mm) plogis(z) } ymiss <- addMissing(y, X, fun=fun) tail(cbind(ymiss, X), 10) ## missing values when y elements are large (i.e., a type of MNAR) fun <- function(y) ifelse(abs(y) > 1, .4, 0) ymiss <- addMissing(y, fun=fun) tail(cbind(y, ymiss), 10) ## End(Not run)
Compute all relevant test statistics, parameter estimates, detection rates, and so on.
This is the computational heavy lifting portion of the Monte Carlo simulation. Users
may define a single Analysis function to perform all the analyses in the same function environment,
or may define a list
of named functions to runSimulation
to allow for a more
modularized approach to performing the analyses in independent blocks (but that share the same generated
data). Note that if a suitable Generate
function was not supplied then this function
can be used to be generate and analyse the Monte Carlo data (though in general this
setup is not recommended for larger simulations).
Analyse(condition, dat, fixed_objects)
Analyse(condition, dat, fixed_objects)
condition |
a single row from the design input (as a |
dat |
the |
fixed_objects |
object passed down from |
In some cases, it may be easier to change the output to a named list
containing
different parameter configurations (e.g., when
determining RMSE values for a large set of population parameters).
The use of try
functions is generally not required in this function because Analyse
is internally wrapped in a try
call. Therefore, if a function stops early
then this will cause the function to halt internally, the message which triggered the stop
will be recorded, and Generate
will be called again to obtain a different dataset.
That said, it may be useful for users to throw their own stop
commands if the data
should be re-drawn for other reasons (e.g., an estimated model terminated correctly
but the maximum number of iterations were reached).
returns a named numeric
vector or data.frame
with the values of interest
(e.g., p-values, effects sizes, etc), or a list
containing values of interest
(e.g., separate matrix and vector of parameter estimates corresponding to elements in
parameters
). If a data.frame
is returned with more than 1 row then these
objects will be wrapped into suitable list
objects
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
stop
, AnalyseIf
, manageWarnings
## Not run: analyse <- function(condition, dat, fixed_objects) { # require packages/define functions if needed, or better yet index with the :: operator require(stats) mygreatfunction <- function(x) print('Do some stuff') #wrap computational statistics in try() statements to control estimation problems welch <- t.test(DV ~ group, dat) ind <- stats::t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) return(ret) } # A more modularized example approach analysis_welch <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ret <- c(p=welch$p.value) ret } analysis_ind <- function(condition, dat, fixed_objects) { ind <- t.test(DV ~ group, dat, var.equal=TRUE) ret <- c(p=ind$p.value) ret } # pass functions as a named list # runSimulation(..., analyse=list(welch=analyse_welch, independent=analysis_ind)) ## End(Not run)
## Not run: analyse <- function(condition, dat, fixed_objects) { # require packages/define functions if needed, or better yet index with the :: operator require(stats) mygreatfunction <- function(x) print('Do some stuff') #wrap computational statistics in try() statements to control estimation problems welch <- t.test(DV ~ group, dat) ind <- stats::t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) return(ret) } # A more modularized example approach analysis_welch <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ret <- c(p=welch$p.value) ret } analysis_ind <- function(condition, dat, fixed_objects) { ind <- t.test(DV ~ group, dat, var.equal=TRUE) ret <- c(p=ind$p.value) ret } # pass functions as a named list # runSimulation(..., analyse=list(welch=analyse_welch, independent=analysis_ind)) ## End(Not run)
Analyse()
function should be executedThis function is designed to prevent specific analysis function executions when the
design conditions are not met. Primarily useful when the analyse
argument to
runSimulation
was input as a named list object, however some of the
analysis functions are not interesting/compatible with the generated data and should
therefore be skipped.
AnalyseIf(x, condition = NULL)
AnalyseIf(x, condition = NULL)
x |
logical statement to evaluate. If the statement evaluates to |
condition |
(optional) the current design condition. This does not need to be supplied
if the expression in |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE)) Generate <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat } # always run this analysis for each row in Design Analyse1 <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value } # Only perform analysis when variances are equal and N = 20 or 30 Analyse2 <- function(condition, dat, fixed_objects) { AnalyseIf(var.equal && N %in% c(20, 30), condition) mod <- t.test(DV ~ IV, data=dat, var.equal=TRUE) mod$p.value } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret } #------------------------------------------------------------------- # append names 'Welch' and 'independent' to associated output res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Welch=Analyse1, independent=Analyse2), summarise=Summarise) res # leave results unnamed res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Analyse1, Analyse2), summarise=Summarise) ## End(Not run)
## Not run: Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE)) Generate <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat } # always run this analysis for each row in Design Analyse1 <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value } # Only perform analysis when variances are equal and N = 20 or 30 Analyse2 <- function(condition, dat, fixed_objects) { AnalyseIf(var.equal && N %in% c(20, 30), condition) mod <- t.test(DV ~ IV, data=dat, var.equal=TRUE) mod$p.value } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret } #------------------------------------------------------------------- # append names 'Welch' and 'independent' to associated output res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Welch=Analyse1, independent=Analyse2), summarise=Summarise) res # leave results unnamed res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Analyse1, Analyse2), summarise=Summarise) ## End(Not run)
The behaviour of this function is very similar to attach
,
however it is environment specific, and
therefore only remains defined in a given function rather than in the Global Environment.
Hence, this function is much safer to use than the attach
, which
incidentally should never be used in your code. This
is useful primarily as a convenience function when you prefer to call the variable names
in condition
directly rather than indexing with condition$sample_size
or
with(condition, sample_size)
, for example.
Attach( ..., omit = NULL, check = TRUE, attach_listone = TRUE, RStudio_flags = FALSE )
Attach( ..., omit = NULL, check = TRUE, attach_listone = TRUE, RStudio_flags = FALSE )
... |
a comma separated list of |
omit |
an optional character vector containing the names of objects that should not
be attached to the current environment. For instance, if the objects named 'a' and 'b' should
not be attached then use |
check |
logical; check to see if the function will accidentally replace previously defined
variables with the same names as in |
attach_listone |
logical; if the element to be assign is a list of length one
then assign the first element of this list with the associated name. This generally avoids
adding an often unnecessary list 1 index, such as |
RStudio_flags |
logical; print R script output comments that disable flagged
missing variables in RStudio? Requires the form |
Note that if you are using RStudio with the "Warn if variable used has no definition in scope"
diagnostic flag then using Attach()
will raise suspensions. To suppress such issues,
you can either disable such flags (the atomic solution) or evaluate the following output
in the R console and place the output in your working simulation file.
Attach(Design, RStudio_flags = TRUE)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Design <- createDesign(N1=c(10,20), N2=c(10,20), sd=c(1,2)) Design # does not use Attach() Generate <- function(condition, fixed_objects ) { # condition = single row of Design input (e.g., condition <- Design[1,]) N1 <- condition$N1 N2 <- condition$N2 sd <- condition$sd group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } # similar to above, but using the Attach() function instead of indexing Generate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } ##################### # NOTE: if you're using RStudio with code diagnostics on then evaluate + add the # following output to your source file to manually support the flagged variables Attach(Design, RStudio_flags=TRUE) # Below is the same example, however with false positive missing variables suppressed # when # !diagnostics ... is added added to the source file(s) # !diagnostics suppress=N1,N2,sd Generate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat }
Design <- createDesign(N1=c(10,20), N2=c(10,20), sd=c(1,2)) Design # does not use Attach() Generate <- function(condition, fixed_objects ) { # condition = single row of Design input (e.g., condition <- Design[1,]) N1 <- condition$N1 N2 <- condition$N2 sd <- condition$sd group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } # similar to above, but using the Attach() function instead of indexing Generate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } ##################### # NOTE: if you're using RStudio with code diagnostics on then evaluate + add the # following output to your source file to manually support the flagged variables Attach(Design, RStudio_flags=TRUE) # Below is the same example, however with false positive missing variables suppressed # when # !diagnostics ... is added added to the source file(s) # !diagnostics suppress=N1,N2,sd Generate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat }
Example results from the Brown and Forsythe (1974) article on robust estimators for
variance ratio tests. Statistical tests are organized by columns and the unique design conditions
are organized by rows. See BF_sim_alternative
for an alternative form of the same
simulation. Code for this simulation is available of the wiki
(https://github.com/philchalmers/SimDesign/wiki).
Phil Chalmers [email protected]
Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364–367.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: data(BF_sim) head(BF_sim) #Type I errors subset(BF_sim, var_ratio == 1) ## End(Not run)
## Not run: data(BF_sim) head(BF_sim) #Type I errors subset(BF_sim, var_ratio == 1) ## End(Not run)
Example results from the Brown and Forsythe (1974) article on robust estimators for
variance ratio tests. Statistical tests and distributions are organized by columns
and the unique design conditions are organized by rows. See BF_sim
for an alternative
form of the same simulation where distributions are also included in the rows.
Code for this simulation is available on the wiki (https://github.com/philchalmers/SimDesign/wiki).
Phil Chalmers [email protected]
Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364–367.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: data(BF_sim_alternative) head(BF_sim_alternative) #' #Type I errors subset(BF_sim_alternative, var_ratio == 1) ## End(Not run)
## Not run: data(BF_sim_alternative) head(BF_sim_alternative) #' #Type I errors subset(BF_sim_alternative, var_ratio == 1) ## End(Not run)
Computes the (relative) bias of a sample estimate from the parameter value.
Accepts estimate and parameter values, as well as estimate values which are in deviation form.
If relative bias is requested the estimate
and parameter
inputs are both required.
bias( estimate, parameter = NULL, type = "bias", abs = FALSE, percent = FALSE, unname = FALSE )
bias( estimate, parameter = NULL, type = "bias", abs = FALSE, percent = FALSE, unname = FALSE )
estimate |
a |
parameter |
a |
type |
type of bias statistic to return. Default ( |
abs |
logical; find the absolute bias between the parameters and estimates? This effectively
just applies the |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
returns a numeric
vector indicating the overall (relative/standardized)
bias in the estimates
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
pop <- 2 samp <- rnorm(100, 2, sd = 0.5) bias(samp, pop) bias(samp, pop, type = 'relative') bias(samp, pop, type = 'standardized') dev <- samp - pop bias(dev) # equivalent here bias(mean(samp), pop) # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) bias(mat, parameter = 2) bias(mat, parameter = 2, type = 'relative') bias(mat, parameter = 2, type = 'standardized') # different parameter associated with each column mat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25)) bias(mat, parameter = c(2,3)) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) bias(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) bias(estimates, parameters) # relative difference dividing by the magnitude of parameters bias(estimates, parameters, type = 'abs_relative') # relative bias as a percentage bias(estimates, parameters, type = 'abs_relative', percent = TRUE) # percentage error (PE) statistic given alpha (Type I error) and EDR() result # edr <- EDR(results, alpha = .05) edr <- c(.04, .05, .06, .08) bias(matrix(edr, 1L), .05, type = 'relative', percent = TRUE)
pop <- 2 samp <- rnorm(100, 2, sd = 0.5) bias(samp, pop) bias(samp, pop, type = 'relative') bias(samp, pop, type = 'standardized') dev <- samp - pop bias(dev) # equivalent here bias(mean(samp), pop) # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) bias(mat, parameter = 2) bias(mat, parameter = 2, type = 'relative') bias(mat, parameter = 2, type = 'standardized') # different parameter associated with each column mat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25)) bias(mat, parameter = c(2,3)) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) bias(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) bias(estimates, parameters) # relative difference dividing by the magnitude of parameters bias(estimates, parameters, type = 'abs_relative') # relative bias as a percentage bias(estimates, parameters, type = 'abs_relative', percent = TRUE) # percentage error (PE) statistic given alpha (Type I error) and EDR() result # edr <- EDR(results, alpha = .05) edr <- c(.04, .05, .06, .08) bias(matrix(edr, 1L), .05, type = 'relative', percent = TRUE)
This function computes bootstrap mean-square error estimates to approximate the sampling behavior
of the meta-statistics in SimDesign's summarise
functions. A single design condition is
supplied, and a simulation with max(Rstar)
replications is performed whereby the
generate-analyse results are collected. After obtaining these replication values, the
replications are further drawn from (with replacement) using the differing sizes in Rstar
to approximate the bootstrap MSE behavior given different replication sizes. Finally, given these
bootstrap estimates linear regression models are fitted using the predictor term
one_sqrtR = 1 / sqrt(Rstar)
to allow extrapolation to replication sizes not observed in
Rstar
. For more information about the method and subsequent bootstrap MSE plots,
refer to Koehler, Brown, and Haneuse (2009).
bootPredict( condition, generate, analyse, summarise, fixed_objects = NULL, ..., Rstar = seq(100, 500, by = 100), boot_draws = 1000 ) boot_predict(...)
bootPredict( condition, generate, analyse, summarise, fixed_objects = NULL, ..., Rstar = seq(100, 500, by = 100), boot_draws = 1000 ) boot_predict(...)
condition |
a |
generate |
see |
analyse |
see |
summarise |
see |
fixed_objects |
see |
... |
additional arguments to be passed to |
Rstar |
a vector containing the size of the bootstrap subsets to obtain. Default investigates the vector [100, 200, 300, 400, 500] to compute the respective MSE terms |
boot_draws |
number of bootstrap replications to draw. Default is 1000 |
returns a list of linear model objects (via lm
) for each
meta-statistics returned by the summarise()
function
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Koehler, E., Brown, E., & Haneuse, S. J.-P. A. (2009). On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses. The American Statistician, 63, 155-162.
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
set.seed(4321) Design <- createDesign(sigma = c(1, 2)) #------------------------------------------------------------------- Generate <- function(condition, fixed_objects) { dat <- rnorm(100, 0, condition$sigma) dat } Analyse <- function(condition, dat, fixed_objects) { CIs <- t.test(dat)$conf.int names(CIs) <- c('lower', 'upper') ret <- c(mean = mean(dat), CIs) ret } Summarise <- function(condition, results, fixed_objects) { ret <- c(mu_bias = bias(results[,"mean"], 0), mu_coverage = ECR(results[,c("lower", "upper")], parameter = 0)) ret } ## Not run: # boot_predict supports only one condition at a time out <- bootPredict(condition=Design[1L, , drop=FALSE], generate=Generate, analyse=Analyse, summarise=Summarise) out # list of fitted linear model(s) # extract first meta-statistic mu_bias <- out$mu_bias dat <- model.frame(mu_bias) print(dat) # original R metric plot R <- 1 / dat$one_sqrtR^2 plot(R, dat$MSE, type = 'b', ylab = 'MSE', main = "Replications by MSE") plot(MSE ~ one_sqrtR, dat, main = "Bootstrap prediction plot", xlim = c(0, max(one_sqrtR)), ylim = c(0, max(MSE)), ylab = 'MSE', xlab = expression(1/sqrt(R))) beta <- coef(mu_bias) abline(a = 0, b = beta, lty = 2, col='red') # what is the replication value when x-axis = .02? What's its associated expected MSE? 1 / .02^2 # number of replications predict(mu_bias, data.frame(one_sqrtR = .02)) # y-axis value # approximately how many replications to obtain MSE = .001? (beta / .001)^2 ## End(Not run)
set.seed(4321) Design <- createDesign(sigma = c(1, 2)) #------------------------------------------------------------------- Generate <- function(condition, fixed_objects) { dat <- rnorm(100, 0, condition$sigma) dat } Analyse <- function(condition, dat, fixed_objects) { CIs <- t.test(dat)$conf.int names(CIs) <- c('lower', 'upper') ret <- c(mean = mean(dat), CIs) ret } Summarise <- function(condition, results, fixed_objects) { ret <- c(mu_bias = bias(results[,"mean"], 0), mu_coverage = ECR(results[,c("lower", "upper")], parameter = 0)) ret } ## Not run: # boot_predict supports only one condition at a time out <- bootPredict(condition=Design[1L, , drop=FALSE], generate=Generate, analyse=Analyse, summarise=Summarise) out # list of fitted linear model(s) # extract first meta-statistic mu_bias <- out$mu_bias dat <- model.frame(mu_bias) print(dat) # original R metric plot R <- 1 / dat$one_sqrtR^2 plot(R, dat$MSE, type = 'b', ylab = 'MSE', main = "Replications by MSE") plot(MSE ~ one_sqrtR, dat, main = "Bootstrap prediction plot", xlim = c(0, max(one_sqrtR)), ylim = c(0, max(MSE)), ylab = 'MSE', xlab = expression(1/sqrt(R))) beta <- coef(mu_bias) abline(a = 0, b = beta, lty = 2, col='red') # what is the replication value when x-axis = .02? What's its associated expected MSE? 1 / .02^2 # number of replications predict(mu_bias, data.frame(one_sqrtR = .02)) # y-axis value # approximately how many replications to obtain MSE = .001? (beta / .001)^2 ## End(Not run)
Robustness interval criteria for empirical detection rate estimates and
empirical coverage estimates defined by Bradley (1978).
See EDR
and ECR
to obtain such estimates.
Bradley1978( rate, alpha = 0.05, type = "liberal", CI = FALSE, out.logical = FALSE, out.labels = c("conservative", "robust", "liberal"), unname = FALSE )
Bradley1978( rate, alpha = 0.05, type = "liberal", CI = FALSE, out.logical = FALSE, out.labels = c("conservative", "robust", "liberal"), unname = FALSE )
rate |
(optional) numeric vector containing the empirical detection
rate(s) or empirical confidence interval estimates.
If supplied a character vector with elements defined in
When the input is an empirical coverage rate the argument If this input is missing, the interval criteria will be printed to the console |
alpha |
Type I error rate to evaluated (default is .05) |
type |
character vector indicating the type of interval classification to use. Default is 'liberal', however can be 'stringent' to use Bradley's more stringent robustness criteria |
CI |
logical; should this robust interval be constructed on empirical detection
rates ( |
out.logical |
logical; should the output vector be TRUE/FALSE indicating whether the supplied empirical detection rate/CI should be considered "robust"? Default is FALSE, in which case the out.labels elements are used instead |
out.labels |
character vector of length three indicating the classification labels according to the desired robustness interval |
unname |
logical; apply |
Phil Chalmers [email protected]
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
EDR
, ECR
, Serlin2000
# interval criteria used for empirical detection rates Bradley1978() Bradley1978(type = 'stringent') Bradley1978(alpha = .01, type = 'stringent') # intervals applied to empirical detection rate estimates edr <- c(test1 = .05, test2 = .027, test3 = .051, test4 = .076, test5 = .024) Bradley1978(edr) Bradley1978(edr, out.logical=TRUE) # is robust? ##### # interval criteria used for coverage estimates Bradley1978(CI = TRUE) Bradley1978(CI = TRUE, type = 'stringent') Bradley1978(CI = TRUE, alpha = .01, type = 'stringent') # intervals applied to empirical coverage rate estimates ecr <- c(test1 = .950, test2 = .973, test3 = .949, test4 = .924, test5 = .976) Bradley1978(ecr, CI=TRUE) Bradley1978(ecr, CI=TRUE, out.logical=TRUE) # is robust?
# interval criteria used for empirical detection rates Bradley1978() Bradley1978(type = 'stringent') Bradley1978(alpha = .01, type = 'stringent') # intervals applied to empirical detection rate estimates edr <- c(test1 = .05, test2 = .027, test3 = .051, test4 = .076, test5 = .024) Bradley1978(edr) Bradley1978(edr, out.logical=TRUE) # is robust? ##### # interval criteria used for coverage estimates Bradley1978(CI = TRUE) Bradley1978(CI = TRUE, type = 'stringent') Bradley1978(CI = TRUE, alpha = .01, type = 'stringent') # intervals applied to empirical coverage rate estimates ecr <- c(test1 = .950, test2 = .973, test3 = .949, test4 = .924, test5 = .976) Bradley1978(ecr, CI=TRUE) Bradley1978(ecr, CI=TRUE, out.logical=TRUE) # is robust?
Computes the congruence coefficient, also known as an "unadjusted" correlation or Tucker's congruence coefficient.
CC(x, y = NULL, unname = FALSE)
CC(x, y = NULL, unname = FALSE)
x |
a vector or |
y |
(optional) the second vector input to use if
|
unname |
logical; apply |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
vec1 <- runif(1000) vec2 <- runif(1000) CC(vec1, vec2) # compare to cor() cor(vec1, vec2) # column input df <- data.frame(vec1, vec2, vec3 = runif(1000)) CC(df) cor(df)
vec1 <- runif(1000) vec2 <- runif(1000) CC(vec1, vec2) # compare to cor() cor(vec1, vec2) # column input df <- data.frame(vec1, vec2, vec3 = runif(1000)) CC(df) cor(df)
Sets the sub-stream RNG state within for Pierre L'Ecuyer's (1999)
algorithm. Should be used within distributed array jobs
after suitable L'Ecuyer's (1999) have been distributed to each array, and
each array is further defined to use multi-core processing. See
clusterSetRNGStream
for further information.
clusterSetRNGSubStream(cl, seed)
clusterSetRNGSubStream(cl, seed)
cl |
A cluster from the |
seed |
An integer vector of length 7 as given by |
invisible NULL
Form column standard deviation and variances for numeric arrays (or data frames).
colVars(x, na.rm = FALSE, unname = FALSE) colSDs(x, na.rm = FALSE, unname = FALSE)
colVars(x, na.rm = FALSE, unname = FALSE) colSDs(x, na.rm = FALSE, unname = FALSE)
x |
an array of two dimensions containing numeric, complex, integer or logical values, or a numeric data frame |
na.rm |
logical; remove missing values in each respective column? |
unname |
logical; apply |
Phil Chalmers [email protected]
results <- matrix(rnorm(100), ncol=4) colnames(results) <- paste0('stat', 1:4) colVars(results) colSDs(results) results[1,1] <- NA colSDs(results) colSDs(results, na.rm=TRUE) colSDs(results, na.rm=TRUE, unname=TRUE)
results <- matrix(rnorm(100), ncol=4) colnames(results) <- paste0('stat', 1:4) colVars(results) colSDs(results) results[1,1] <- NA colSDs(results) colSDs(results, na.rm=TRUE) colSDs(results, na.rm=TRUE, unname=TRUE)
Create a partially or fully-crossed data object reflecting the unique simulation design conditions. Each row of the returned object represents a unique simulation condition, and each column represents the named factor variables under study.
createDesign( ..., subset, fractional = NULL, tibble = TRUE, stringsAsFactors = FALSE ) ## S3 method for class 'Design' print(x, list2char = TRUE, pillar.sigfig = 5, ...)
createDesign( ..., subset, fractional = NULL, tibble = TRUE, stringsAsFactors = FALSE ) ## S3 method for class 'Design' print(x, list2char = TRUE, pillar.sigfig = 5, ...)
... |
comma separated list of named input objects representing the simulation
factors to completely cross. Note that these arguments are passed to
|
subset |
(optional) a logical vector indicating elements or rows to keep to create a partially crossed simulation design |
fractional |
a fractional design matrix returned from the
|
tibble |
logical; return a |
stringsAsFactors |
logical; should character variable inputs be coerced
to factors when building a |
x |
object returned by |
list2char |
logical; for |
pillar.sigfig |
number of significant digits to print. Default is 5 |
a tibble
or data.frame
containing the simulation experiment
conditions to be evaluated in runSimulation
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: # modified example from runSimulation() Design <- createDesign(N = c(10, 20), SD = c(1, 2)) Design # remove N=10, SD=2 row from initial definition Design <- createDesign(N = c(10, 20), SD = c(1, 2), subset = !(N == 10 & SD == 2)) Design # example with list inputs Design <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1))) Design # notice levels printed (not typical for tibble) print(Design, list2char = FALSE) # standard tibble output Design <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1)), combo2 = list(c(5,10,5), c(6,7))) Design print(Design, list2char = FALSE) # standard tibble output ########## ## fractional factorial example library(FrF2) # help(FrF2) # 7 factors in 32 runs fr <- FrF2(32,7) dim(fr) fr[1:6,] # Create working simulation design given -1/1 combinations fDesign <- createDesign(sample_size=c(100,200), mean_diff=c(.25, 1, 2), variance.ratio=c(1,4, 8), equal_size=c(TRUE, FALSE), dists=c('norm', 'skew'), same_dists=c(TRUE, FALSE), symmetric=c(TRUE, FALSE), # remove same-normal combo subset = !(symmetric & dists == 'norm'), fractional=fr) fDesign ## End(Not run)
## Not run: # modified example from runSimulation() Design <- createDesign(N = c(10, 20), SD = c(1, 2)) Design # remove N=10, SD=2 row from initial definition Design <- createDesign(N = c(10, 20), SD = c(1, 2), subset = !(N == 10 & SD == 2)) Design # example with list inputs Design <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1))) Design # notice levels printed (not typical for tibble) print(Design, list2char = FALSE) # standard tibble output Design <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1)), combo2 = list(c(5,10,5), c(6,7))) Design print(Design, list2char = FALSE) # standard tibble output ########## ## fractional factorial example library(FrF2) # help(FrF2) # 7 factors in 32 runs fr <- FrF2(32,7) dim(fr) fr[1:6,] # Create working simulation design given -1/1 combinations fDesign <- createDesign(sample_size=c(100,200), mean_diff=c(.25, 1, 2), variance.ratio=c(1,4, 8), equal_size=c(TRUE, FALSE), dists=c('norm', 'skew'), same_dists=c(TRUE, FALSE), symmetric=c(TRUE, FALSE), # remove same-normal combo subset = !(symmetric & dists == 'norm'), fractional=fr) fDesign ## End(Not run)
Computes the detection rate for determining empirical coverage rates given a set of estimated
confidence intervals. Note that using 1 - ECR(CIs, parameter)
will provide the empirical
detection rate. Also supports computing the average width of the CIs, which may be useful when comparing
the efficiency of CI estimators.
ECR( CIs, parameter, tails = FALSE, CI_width = FALSE, complement = FALSE, names = NULL, unname = FALSE )
ECR( CIs, parameter, tails = FALSE, CI_width = FALSE, complement = FALSE, names = NULL, unname = FALSE )
CIs |
a |
parameter |
a numeric scalar indicating the fixed parameter value. Alternative, a |
tails |
logical; when TRUE returns a vector of length 2 to indicate the proportion of times
the parameter was lower or higher than the supplied interval, respectively. This is mainly only
useful when the coverage region is not expected to be symmetric, and therefore is generally not
required. Note that |
CI_width |
logical; rather than returning the overall coverage rate, return the average width of the CIs instead? Useful when comparing the efficiency of different CI estimators |
complement |
logical; rather than computing the proportion of population parameters within the CI, return the proportion outside the advertised CI (1 - ECR = alpha). In the case where only one value is provided, which normally would return a 0 if outside the CI or 1 if inside, the values will be switched (useful when using, for example, CI tests of for the significance of parameters) |
names |
an optional character vector used to name the returned object. Generally useful when more than one CI estimate is investigated at once |
unname |
logical; apply |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
CIs <- matrix(NA, 100, 2) for(i in 1:100){ dat <- rnorm(100) CIs[i,] <- t.test(dat)$conf.int } ECR(CIs, 0) ECR(CIs, 0, tails = TRUE) ECR(CIs, 0, complement = TRUE) # proportion outside interval # single vector input CI <- c(-1, 1) ECR(CI, 0) ECR(CI, 0, complement = TRUE) ECR(CI, 2) ECR(CI, 2, complement = TRUE) ECR(CI, 2, tails = TRUE) # parameters of the same size as CI parameters <- 1:10 CIs <- cbind(parameters - runif(10), parameters + runif(10)) parameters <- parameters + rnorm(10) ECR(CIs, parameters) # average width of CIs ECR(CIs, parameters, CI_width=TRUE) # ECR() for multiple CI estimates in the same object parameter <- 10 CIs <- data.frame(lowerCI_1=parameter - runif(10), upperCI_1=parameter + runif(10), lowerCI_2=parameter - 2*runif(10), upperCI_2=parameter + 2*runif(10)) head(CIs) ECR(CIs, parameter) ECR(CIs, parameter, tails=TRUE) ECR(CIs, parameter, CI_width=TRUE) # often a good idea to provide names for the output ECR(CIs, parameter, names = c('this', 'that')) ECR(CIs, parameter, CI_width=TRUE, names = c('this', 'that')) ECR(CIs, parameter, tails=TRUE, names = c('this', 'that'))
CIs <- matrix(NA, 100, 2) for(i in 1:100){ dat <- rnorm(100) CIs[i,] <- t.test(dat)$conf.int } ECR(CIs, 0) ECR(CIs, 0, tails = TRUE) ECR(CIs, 0, complement = TRUE) # proportion outside interval # single vector input CI <- c(-1, 1) ECR(CI, 0) ECR(CI, 0, complement = TRUE) ECR(CI, 2) ECR(CI, 2, complement = TRUE) ECR(CI, 2, tails = TRUE) # parameters of the same size as CI parameters <- 1:10 CIs <- cbind(parameters - runif(10), parameters + runif(10)) parameters <- parameters + rnorm(10) ECR(CIs, parameters) # average width of CIs ECR(CIs, parameters, CI_width=TRUE) # ECR() for multiple CI estimates in the same object parameter <- 10 CIs <- data.frame(lowerCI_1=parameter - runif(10), upperCI_1=parameter + runif(10), lowerCI_2=parameter - 2*runif(10), upperCI_2=parameter + 2*runif(10)) head(CIs) ECR(CIs, parameter) ECR(CIs, parameter, tails=TRUE) ECR(CIs, parameter, CI_width=TRUE) # often a good idea to provide names for the output ECR(CIs, parameter, names = c('this', 'that')) ECR(CIs, parameter, CI_width=TRUE, names = c('this', 'that')) ECR(CIs, parameter, tails=TRUE, names = c('this', 'that'))
Computes the detection/rejection rate for determining empirical Type I error and power rates using information from p-values.
EDR(p, alpha = 0.05, unname = FALSE)
EDR(p, alpha = 0.05, unname = FALSE)
p |
a |
alpha |
the detection threshold (typical values are .10, .05, and .01). Default is .05 |
unname |
logical; apply |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
rates <- numeric(100) for(i in 1:100){ dat <- rnorm(100) rates[i] <- t.test(dat)$p.value } EDR(rates) EDR(rates, alpha = .01) # multiple rates at once rates <- cbind(runif(1000), runif(1000)) EDR(rates)
rates <- numeric(100) for(i in 1:100){ dat <- rnorm(100) rates[i] <- t.test(dat)$p.value } EDR(rates) EDR(rates, alpha = .01) # multiple rates at once rates <- cbind(runif(1000), runif(1000)) EDR(rates)
Repeat each design row the specified number of times. This is primarily used
for cluster computing where jobs are distributed with batches of replications
and later aggregated into a complete simulation object
(see runArraySimulation
and SimCollect
).
expandDesign(Design, repeat_conditions)
expandDesign(Design, repeat_conditions)
Design |
object created by |
repeat_conditions |
integer vector used to repeat each design row the specified number of times. Can either be a single integer, which repeats each row this many times, or an integer vector equal to the number of total rows in the created object. This argument is useful when distributing independent row conditions to
cluster computing environments, particularly with different |
a tibble
or data.frame
containing the simulation experiment
conditions to be evaluated in runSimulation
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
createDesign
, SimCollect
,
runArraySimulation
## Not run: # repeat each row 4 times (for cluster computing) Design <- createDesign(N = c(10, 20), SD.equal = c(TRUE, FALSE)) Design4 <- expandDesign(Design, 4) Design4 # repeat first two rows 2x and the rest 4 times (for cluster computing # where first two conditions are faster to execute) Design <- createDesign(SD.equal = c(TRUE, FALSE), N = c(10, 100, 1000)) Design24 <- expandDesign(Design, c(2,2,rep(4, 4))) Design24 ## End(Not run)
## Not run: # repeat each row 4 times (for cluster computing) Design <- createDesign(N = c(10, 20), SD.equal = c(TRUE, FALSE)) Design4 <- expandDesign(Design, 4) Design4 # repeat first two rows 2x and the rest 4 times (for cluster computing # where first two conditions are faster to execute) Design <- createDesign(SD.equal = c(TRUE, FALSE), N = c(10, 100, 1000)) Design24 <- expandDesign(Design, c(2,2,rep(4, 4))) Design24 ## End(Not run)
Generate data from a single row in the design
input (see runSimulation
). R contains
numerous approaches to generate data, some of which are contained in the base package, as well
as in SimDesign
(e.g., rmgh
, rValeMaurelli
, rHeadrick
).
However the majority can be found in external packages. See CRAN's list of possible distributions here:
https://CRAN.R-project.org/view=Distributions. Note that this function technically
can be omitted if the data generation is provided in the Analyse
step, though
in general this is not recommended.
Generate(condition, fixed_objects)
Generate(condition, fixed_objects)
condition |
a single row from the |
fixed_objects |
object passed down from |
The use of try
functions is generally not required in this function because Generate
is internally wrapped in a try
call. Therefore, if a function stops early
then this will cause the function to halt internally, the message which triggered the stop
will be recorded, and Generate
will be called again to obtain a different dataset.
That said, it may be useful for users to throw their own stop
commands if the data
should be re-drawn for other reasons (e.g., an estimated model terminated correctly
but the maximum number of iterations were reached).
returns a single object containing the data to be analyzed (usually a
vector
, matrix
, or data.frame
),
or list
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
addMissing
, Attach
,
rmgh
, rValeMaurelli
, rHeadrick
## Not run: generate <- function(condition, fixed_objects) { N1 <- condition$sample_sizes_group1 N2 <- condition$sample_sizes_group2 sd <- condition$standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) # just a silly example of a simulated parameter pars <- list(random_number = rnorm(1)) list(dat=dat, parameters=pars) } # similar to above, but using the Attach() function instead of indexing generate <- function(condition, fixed_objects) { Attach(condition) N1 <- sample_sizes_group1 N2 <- sample_sizes_group2 sd <- standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } generate2 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- rnorm(100, mu) dat #return simple vector (discard mu information) } generate3 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- data.frame(DV = rnorm(100, mu)) dat } ## End(Not run)
## Not run: generate <- function(condition, fixed_objects) { N1 <- condition$sample_sizes_group1 N2 <- condition$sample_sizes_group2 sd <- condition$standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) # just a silly example of a simulated parameter pars <- list(random_number = rnorm(1)) list(dat=dat, parameters=pars) } # similar to above, but using the Attach() function instead of indexing generate <- function(condition, fixed_objects) { Attach(condition) N1 <- sample_sizes_group1 N2 <- sample_sizes_group2 sd <- standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } generate2 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- rnorm(100, mu) dat #return simple vector (discard mu information) } generate3 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- data.frame(DV = rnorm(100, mu)) dat } ## End(Not run)
Generate()
function should be executedThis function is designed to prevent specific generate function executions when the
design conditions are not met. Primarily useful when the generate
argument to
runSimulation
was input as a named list object, however should only be
applied for some specific design condition (otherwise, the data generation moves to the
next function in the list).
GenerateIf(x, condition = NULL)
GenerateIf(x, condition = NULL)
x |
logical statement to evaluate. If the statement evaluates to |
condition |
(optional) the current design condition. This does not need to be supplied
if the expression in |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: # SimFunctions(nGenerate = 2) Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE)) Generate.G1 <- function(condition, fixed_objects) { GenerateIf(condition$var.equal == FALSE) # only run when unequal vars Attach(condition) dat <- data.frame(DV = c(rnorm(N), rnorm(N, sd=2)), IV = gl(2, N, labels=c('G1', 'G2'))) dat } Generate.G2 <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat } # always run this analysis for each row in Design Analyse <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret } #------------------------------------------------------------------- # append names 'Welch' and 'independent' to associated output res <- runSimulation(design=Design, replications=1000, generate=list(G1=Generate.G1, G2=Generate.G2), analyse=Analyse, summarise=Summarise) res ## End(Not run)
## Not run: # SimFunctions(nGenerate = 2) Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE)) Generate.G1 <- function(condition, fixed_objects) { GenerateIf(condition$var.equal == FALSE) # only run when unequal vars Attach(condition) dat <- data.frame(DV = c(rnorm(N), rnorm(N, sd=2)), IV = gl(2, N, labels=c('G1', 'G2'))) dat } Generate.G2 <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat } # always run this analysis for each row in Design Analyse <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret } #------------------------------------------------------------------- # append names 'Welch' and 'independent' to associated output res <- runSimulation(design=Design, replications=1000, generate=list(G1=Generate.G1, G2=Generate.G2), analyse=Analyse, summarise=Summarise) res ## End(Not run)
Generate seeds to be passed to runSimulation
's seed
input. Values
are sampled from 1 to 2147483647, or are generated using L'Ecuyer-CMRG's (2002)
method (returning either a list if arrayID
is omitted, or the specific
row value from this list if arrayID
is included).
genSeeds(design = 1L, iseed = NULL, arrayID = NULL, old.seeds = NULL) gen_seeds(...)
genSeeds(design = 1L, iseed = NULL, arrayID = NULL, old.seeds = NULL) gen_seeds(...)
design |
design matrix that requires a unique seed per condition, or a number indicating the number of seeds to generate. Default generates one number |
iseed |
the initial |
arrayID |
(optional) single integer input corresponding to the specific
row in the |
old.seeds |
(optional) vector or matrix of last seeds used in
previous simulations to avoid repeating the same seed on a subsequent run.
Note that this approach should be used sparingly as seeds set more frequently
are more likely to correlate, and therefore provide less optimal random
number behaviour (e.g., if performing a simulation on two runs to achieve
5000 * 2 = 10,000 replications this is likely reasonable,
but for simulations with 100 * 2 = 200 replications this is more
likely to be sub-optimal).
Length must be equal to the number of rows in |
... |
does nothing |
Phil Chalmers [email protected]
# generate 1 seed (default) genSeeds() # generate 5 unique seeds genSeeds(5) # generate from nrow(design) design <- createDesign(factorA=c(1,2,3), factorB=letters[1:3]) seeds <- genSeeds(design) seeds # construct new seeds that are independent from original (use this sparingly) newseeds <- genSeeds(design, old.seeds=seeds) newseeds # can be done in batches too newseeds2 <- genSeeds(design, old.seeds=cbind(seeds, newseeds)) cbind(seeds, newseeds, newseeds2) # all unique ############ # generate seeds for runArraySimulation() (iseed <- genSeeds()) # initial seed seed_list <- genSeeds(design, iseed=iseed) seed_list # expand number of unique seeds given iseed (e.g., in case more replications # are required at a later date) seed_list_tmp <- genSeeds(nrow(design)*2, iseed=iseed) str(seed_list_tmp) # first 9 seeds identical to seed_list # more usefully for HPC, extract only the seed associated with an arrayID arraySeed.15 <- genSeeds(nrow(design)*2, iseed=iseed, arrayID=15) arraySeed.15
# generate 1 seed (default) genSeeds() # generate 5 unique seeds genSeeds(5) # generate from nrow(design) design <- createDesign(factorA=c(1,2,3), factorB=letters[1:3]) seeds <- genSeeds(design) seeds # construct new seeds that are independent from original (use this sparingly) newseeds <- genSeeds(design, old.seeds=seeds) newseeds # can be done in batches too newseeds2 <- genSeeds(design, old.seeds=cbind(seeds, newseeds)) cbind(seeds, newseeds, newseeds2) # all unique ############ # generate seeds for runArraySimulation() (iseed <- genSeeds()) # initial seed seed_list <- genSeeds(design, iseed=iseed) seed_list # expand number of unique seeds given iseed (e.g., in case more replications # are required at a later date) seed_list_tmp <- genSeeds(nrow(design)*2, iseed=iseed) str(seed_list_tmp) # first 9 seeds identical to seed_list # more usefully for HPC, extract only the seed associated with an arrayID arraySeed.15 <- genSeeds(nrow(design)*2, iseed=iseed, arrayID=15) arraySeed.15
Get the array ID from an HPC array distribution job (e.g., from SLURM or
from optional command line arguments).
The array ID is used to index the rows in the design
object in runArraySimulation
. For instance,
a SLURM array with 10 independent jobs might have the following shell
instructions.
getArrayID(type = "slurm", trailingOnly = TRUE, ID.shift = 0L)
getArrayID(type = "slurm", trailingOnly = TRUE, ID.shift = 0L)
type |
an integer indicating the element from the result of
|
trailingOnly |
logical value passed to |
ID.shift |
single integer value used to shift the array ID by a constant.
Useful when there are array range limitation that must be specified in the
shell files (e.g., array can only be 10000 but there are more rows
in the |
#!/bin/bash -l
#SBATCH --time=00:01:00
#SBATCH --array=1-10
which names the associated jobs with the numbers 1 through 10.
getArrayID()
then extracts this information per array, which
is used as the runArraySimulation(design, ..., arrayID = getArrayID())
to
pass specific rows for the design
object.
## Not run: # get slurm array ID arrayID <- getArrayID() # get ID based on first optional argument in shell specification arrayID <- getArrayID(type = 1) # pass to # runArraySimulation(design, ...., arrayID = arrayID) # increase detected arrayID by constant 9999 (for array specification limitations) arrayID <- getArrayID(ID.shift=9999) ## End(Not run)
## Not run: # get slurm array ID arrayID <- getArrayID() # get ID based on first optional argument in shell specification arrayID <- getArrayID(type = 1) # pass to # runArraySimulation(design, ...., arrayID = arrayID) # increase detected arrayID by constant 9999 (for array specification limitations) arrayID <- getArrayID(ID.shift=9999) ## End(Not run)
Computes the average/cumulative deviation given two continuous functions and an optional function representing the probability density function. Only one-dimensional integration is supported.
IRMSE( estimate, parameter, fn, density = function(theta, ...) 1, lower = -Inf, upper = Inf, ... )
IRMSE( estimate, parameter, fn, density = function(theta, ...) 1, lower = -Inf, upper = Inf, ... )
estimate |
a vector of parameter estimates |
parameter |
a vector of population parameters |
fn |
a continuous function where the first argument is to be integrated and the second argument is a vector of parameters or parameter estimates. This function represents a implied continuous function which uses the sample estimates or population parameters |
density |
(optional) a density function used to marginalize (i.e., average), where the first
argument is to be integrated, and must be of the form |
lower |
lower bound to begin numerical integration from |
upper |
upper bound to finish numerical integration to |
... |
additional parameters to pass to |
The integrated root mean-square error (IRMSE) is of the form
where is the density function used to marginalize the continuous sample
(
) and population (
) functions.
returns a single numeric
term indicating the average/cumulative deviation
given the supplied continuous functions
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# logistic regression function with one slope and intercept fn <- function(theta, param) 1 / (1 + exp(-(param[1] + param[2] * theta))) # sample and population sets est <- c(-0.4951, 1.1253) pop <- c(-0.5, 1) theta <- seq(-10,10,length.out=1000) plot(theta, fn(theta, pop), type = 'l', col='red', ylim = c(0,1)) lines(theta, fn(theta, est), col='blue', lty=2) # cumulative result (i.e., standard integral) IRMSE(est, pop, fn) # integrated RMSE result by marginalizing over a N(0,1) distribution den <- function(theta, mean, sd) dnorm(theta, mean=mean, sd=sd) IRMSE(est, pop, fn, den, mean=0, sd=1) # this specification is equivalent to the above den2 <- function(theta, ...) dnorm(theta, ...) IRMSE(est, pop, fn, den2, mean=0, sd=1)
# logistic regression function with one slope and intercept fn <- function(theta, param) 1 / (1 + exp(-(param[1] + param[2] * theta))) # sample and population sets est <- c(-0.4951, 1.1253) pop <- c(-0.5, 1) theta <- seq(-10,10,length.out=1000) plot(theta, fn(theta, pop), type = 'l', col='red', ylim = c(0,1)) lines(theta, fn(theta, est), col='blue', lty=2) # cumulative result (i.e., standard integral) IRMSE(est, pop, fn) # integrated RMSE result by marginalizing over a N(0,1) distribution den <- function(theta, mean, sd) dnorm(theta, mean=mean, sd=sd) IRMSE(est, pop, fn, den, mean=0, sd=1) # this specification is equivalent to the above den2 <- function(theta, ...) dnorm(theta, ...) IRMSE(est, pop, fn, den2, mean=0, sd=1)
Computes the average absolute deviation of a sample estimate from the parameter value. Accepts estimate and parameter values, as well as estimate values which are in deviation form.
MAE(estimate, parameter = NULL, type = "MAE", percent = FALSE, unname = FALSE)
MAE(estimate, parameter = NULL, type = "MAE", percent = FALSE, unname = FALSE)
estimate |
a |
parameter |
a |
type |
type of deviation to compute. Can be |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
returns a numeric vector indicating the overall mean absolute error in the estimates
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
RMSE
pop <- 1 samp <- rnorm(100, 1, sd = 0.5) MAE(samp, pop) dev <- samp - pop MAE(dev) MAE(samp, pop, type = 'NMAE') MAE(samp, pop, type = 'SMAE') # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) MAE(mat, parameter = 2) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) MAE(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) MAE(estimates, parameters)
pop <- 1 samp <- rnorm(100, 1, sd = 0.5) MAE(samp, pop) dev <- samp - pop MAE(dev) MAE(samp, pop, type = 'NMAE') MAE(samp, pop, type = 'SMAE') # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) MAE(mat, parameter = 2) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) MAE(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) MAE(estimates, parameters)
Function provides more nuanced management of known message outputs
messages that appear in function calls outside the front-end users control
(e.g., functions written in third-party packages). Specifically,
this function provides a less nuclear approach than
quiet
and friends, which suppresses all cat
and
message
s raised, and instead allows for specific messages to be
raised either to warnings or, even more extremely, to errors. Note that for
messages that are not suppressed the order with which the output and message
calls appear in the original function is not retained.
manageMessages( expr, allow = NULL, message2warning = NULL, message2error = NULL, ... )
manageMessages( expr, allow = NULL, message2warning = NULL, message2error = NULL, ... )
expr |
expression to be evaluated (e.g., ret <- |
allow |
(optional) a |
message2warning |
(optional) Input can be a |
message2error |
(optional) Input can be a |
... |
additional arguments passed to |
returns the original result of eval(expr)
, with warning
messages either left the same, increased to errors, or suppressed (depending
on the input specifications)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
## Not run: myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different ') cat(" many times!\n") cat("Too many messages can be annoying \n") if(warn) warning('It may even throw warnings ') x } out <- myfun(1) out # tell the function to shhhh out <- quiet(myfun(1)) out # same default behaviour as quiet(), but potential for nuance out2 <- manageMessages(myfun(1)) identical(out, out2) # allow some messages to still get printed out2 <- manageMessages(myfun(1), allow = "many times!") out2 <- manageMessages(myfun(1), allow = "This function is rather chatty") # note: . matches single character (regex) out2 <- manageMessages(myfun(1), allow = c("many times.", "This function is rather chatty")) # convert specific message to warning out3 <- manageMessages(myfun(1), message2warning = "many times!") identical(out, out3) # other warnings also get through out3 <- manageMessages(myfun(1, warn=TRUE), message2warning = "times!") identical(out, out3) # convert message to error manageMessages(myfun(1), message2error = "m... times!") # multiple message intensity changes manageMessages(myfun(1), message2warning = "It even prints in different output forms", message2error = "many times!") manageMessages(myfun(1), allow = c("This function is rather chatty", "Too many messages can be annoying"), message2warning = "It even prints in different output forms", message2error = "many times!") ## End(Not run)
## Not run: myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different ') cat(" many times!\n") cat("Too many messages can be annoying \n") if(warn) warning('It may even throw warnings ') x } out <- myfun(1) out # tell the function to shhhh out <- quiet(myfun(1)) out # same default behaviour as quiet(), but potential for nuance out2 <- manageMessages(myfun(1)) identical(out, out2) # allow some messages to still get printed out2 <- manageMessages(myfun(1), allow = "many times!") out2 <- manageMessages(myfun(1), allow = "This function is rather chatty") # note: . matches single character (regex) out2 <- manageMessages(myfun(1), allow = c("many times.", "This function is rather chatty")) # convert specific message to warning out3 <- manageMessages(myfun(1), message2warning = "many times!") identical(out, out3) # other warnings also get through out3 <- manageMessages(myfun(1, warn=TRUE), message2warning = "times!") identical(out, out3) # convert message to error manageMessages(myfun(1), message2error = "m... times!") # multiple message intensity changes manageMessages(myfun(1), message2warning = "It even prints in different output forms", message2error = "many times!") manageMessages(myfun(1), allow = c("This function is rather chatty", "Too many messages can be annoying"), message2warning = "It even prints in different output forms", message2error = "many times!") ## End(Not run)
Function provides more nuanced management of known warning
messages that appear in function calls outside the front-end users control
(e.g., functions written in third-party packages). Specifically,
this function provides a less nuclear approach than
suppressWarnings
, which suppresses all warning messages
rather than those which are known
to be innocuous to the current application, or when globally setting options(warn=2)
,
which has the opposite effect of treating all warnings messages as errors
in the function executions. To avoid these two extreme behaviors,
character
vectors can instead be supplied to this function
to either leave the raised warnings
as-is (default behaviour), raise only specific warning messages to errors,
or specify specific warning messages that can be generally be ignored
(and therefore suppressed) while allowing new or yet to be discovered warnings
to still be raised.
manageWarnings(expr, warning2error = FALSE, suppress = NULL, ...)
manageWarnings(expr, warning2error = FALSE, suppress = NULL, ...)
expr |
expression to be evaluated (e.g., ret <- |
warning2error |
Alternatively, and more useful for specificity reasons,
input can be a |
suppress |
a |
... |
additional arguments passed to |
In general, global/nuclear behaviour of warning messages should be avoided
as they are generally bad practice. On one extreme,
when suppressing all warning messages using suppressWarnings
,
potentially important warning messages will become muffled, which can be problematic
if the code developer has not become aware of these (now muffled) warnings.
Moreover, this can become a long-term sustainability issue when third-party functions
that the developer's code depends upon throw new warnings in the future as the
code developer will be less likely to become aware of these newly implemented warnings.
On the other extreme, where all warning messages are turned into errors
using options(warn=2)
, innocuous warning messages can and will be (unwantingly)
raised to an error. This negatively affects the logical workflow of the
developer's functions, where more error messages must now be manually managed
(e.g., via tryCatch
), including the known to be innocuous
warning messages as these will now considered as errors.
To avoid these extremes, front-end users should first make note of the warning messages
that have been raised in their prior executions, and organized these messages
into vectors of ignorable warnings (least severe), known/unknown warnings
that should remain as warnings (even if not known by the code developer yet),
and explicit warnings that ought to be considered errors for the current
application (most severe). Once collected, these can be passed to the respective
warning2error
argument to increase the intensity of a specific warning
raised, or to the suppress
argument to suppress only the messages that
have been deemed ignorable a priori (and therefore allowing all other warning
messages to be raised).
returns the original result of eval(expr)
, with warning
messages either left the same, increased to errors, or suppressed (depending
on the input specifications)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
## Not run: fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, warn_trailing = FALSE, error=FALSE){ if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(warn_trailing) warning(sprintf('Message with lots of random trailings: %s', paste0(sample(letters, sample(1:20, 1)), collapse=','))) if(error) stop('terminate function call') return('Returned from fun()') } # normal run (no warnings or errors) out <- fun() out # these are all the same manageWarnings(out <- fun()) out <- manageWarnings(fun()) out <- fun() |> manageWarnings() # errors treated normally fun(error=TRUE) fun(error=TRUE) |> manageWarnings() # all warnings/returns treated normally by default ret1 <- fun(warn1=TRUE) ret2 <- fun(warn1=TRUE) |> manageWarnings() identical(ret1, ret2) # all warnings converted to errors (similar to options(warn=2), but local) fun(warn1=TRUE) |> manageWarnings(warning2error=TRUE) fun(warn2=TRUE) |> manageWarnings(warning2error=TRUE) # Specific warnings treated as errors (others stay as warnings) # Here, treat first warning message as error but not the second or third ret <- fun(warn1=TRUE) # warning ret <- fun(warn1=TRUE) |> manageWarnings("Message one") # now error ret <- fun(warn2=TRUE) |> manageWarnings("Message one") # still a warning # multiple warnings raised but not converted as they do not match criteria fun(warn2=TRUE, warn3=TRUE) fun(warn2=TRUE, warn3=TRUE) |> manageWarnings("Message one") # Explicitly convert multiple warning messages, allowing others through. # This is generally the best use of the function's specificity fun(warn1=TRUE, warn2=TRUE) fun(warn1=TRUE) |> # error given either message manageWarnings(c("Message one", "Message two")) fun(warn2=TRUE) |> manageWarnings(c("Message one", "Message two")) # last warning gets through (left as valid warning) ret <- fun(warn3=TRUE) |> manageWarnings(c("Message one", "Message two")) ret # suppress warnings that have only partial matching fun(warn_trailing=TRUE) fun(warn_trailing=TRUE) fun(warn_trailing=TRUE) # partial match, therefore suppressed fun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: ") # multiple suppress strings fun(warn_trailing=TRUE) |> manageWarnings(suppress=c("Message with lots of random trailings: ", "Suppress this too")) # could also use .* to catch all remaining characters (finer regex control) fun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: .*") ########### # Combine with quiet() and suppress argument to suppress innocuous messages fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, error=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(error) stop('terminate function call') return('Returned from fun()') } # normal run (no warnings or errors, but messages) out <- fun() out <- quiet(fun()) # using "indoor voice" # suppress all print messages and warnings (not recommended) fun(warn2=TRUE) |> quiet() fun(warn2=TRUE) |> quiet() |> suppressWarnings() # convert all warning to errors, and keep suppressing messages via quiet() fun(warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE) # define tolerable warning messages (only warn1 deemed ignorable) ret <- fun(warn1=TRUE) |> quiet() |> manageWarnings(suppress = 'Message one') # all other warnings raised to an error except ignorable ones fun(warn1=TRUE, warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE, suppress = 'Message one') # only warn2 raised to an error explicitly (warn3 remains as warning) ret <- fun(warn1=TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one') fun(warn1=TRUE, warn2 = TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one') ########################### # Practical example, converting warning into error for model that # failed to converged normally library(lavaan) ## The industrialization and Political Democracy Example ## Bollen (1989), page 332 model <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 dem60 =~ y1 + a*y2 + b*y3 + c*y4 dem65 =~ y5 + a*y6 + b*y7 + c*y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ' # throws a warning fit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) # for a simulation study, often better to treat this as an error fit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) |> manageWarnings(warning2error = "the optimizer warns that a solution has NOT been found!") ## End(Not run)
## Not run: fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, warn_trailing = FALSE, error=FALSE){ if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(warn_trailing) warning(sprintf('Message with lots of random trailings: %s', paste0(sample(letters, sample(1:20, 1)), collapse=','))) if(error) stop('terminate function call') return('Returned from fun()') } # normal run (no warnings or errors) out <- fun() out # these are all the same manageWarnings(out <- fun()) out <- manageWarnings(fun()) out <- fun() |> manageWarnings() # errors treated normally fun(error=TRUE) fun(error=TRUE) |> manageWarnings() # all warnings/returns treated normally by default ret1 <- fun(warn1=TRUE) ret2 <- fun(warn1=TRUE) |> manageWarnings() identical(ret1, ret2) # all warnings converted to errors (similar to options(warn=2), but local) fun(warn1=TRUE) |> manageWarnings(warning2error=TRUE) fun(warn2=TRUE) |> manageWarnings(warning2error=TRUE) # Specific warnings treated as errors (others stay as warnings) # Here, treat first warning message as error but not the second or third ret <- fun(warn1=TRUE) # warning ret <- fun(warn1=TRUE) |> manageWarnings("Message one") # now error ret <- fun(warn2=TRUE) |> manageWarnings("Message one") # still a warning # multiple warnings raised but not converted as they do not match criteria fun(warn2=TRUE, warn3=TRUE) fun(warn2=TRUE, warn3=TRUE) |> manageWarnings("Message one") # Explicitly convert multiple warning messages, allowing others through. # This is generally the best use of the function's specificity fun(warn1=TRUE, warn2=TRUE) fun(warn1=TRUE) |> # error given either message manageWarnings(c("Message one", "Message two")) fun(warn2=TRUE) |> manageWarnings(c("Message one", "Message two")) # last warning gets through (left as valid warning) ret <- fun(warn3=TRUE) |> manageWarnings(c("Message one", "Message two")) ret # suppress warnings that have only partial matching fun(warn_trailing=TRUE) fun(warn_trailing=TRUE) fun(warn_trailing=TRUE) # partial match, therefore suppressed fun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: ") # multiple suppress strings fun(warn_trailing=TRUE) |> manageWarnings(suppress=c("Message with lots of random trailings: ", "Suppress this too")) # could also use .* to catch all remaining characters (finer regex control) fun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: .*") ########### # Combine with quiet() and suppress argument to suppress innocuous messages fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, error=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(error) stop('terminate function call') return('Returned from fun()') } # normal run (no warnings or errors, but messages) out <- fun() out <- quiet(fun()) # using "indoor voice" # suppress all print messages and warnings (not recommended) fun(warn2=TRUE) |> quiet() fun(warn2=TRUE) |> quiet() |> suppressWarnings() # convert all warning to errors, and keep suppressing messages via quiet() fun(warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE) # define tolerable warning messages (only warn1 deemed ignorable) ret <- fun(warn1=TRUE) |> quiet() |> manageWarnings(suppress = 'Message one') # all other warnings raised to an error except ignorable ones fun(warn1=TRUE, warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE, suppress = 'Message one') # only warn2 raised to an error explicitly (warn3 remains as warning) ret <- fun(warn1=TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one') fun(warn1=TRUE, warn2 = TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one') ########################### # Practical example, converting warning into error for model that # failed to converged normally library(lavaan) ## The industrialization and Political Democracy Example ## Bollen (1989), page 332 model <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 dem60 =~ y1 + a*y2 + b*y3 + c*y4 dem65 =~ y5 + a*y6 + b*y7 + c*y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ' # throws a warning fit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) # for a simulation study, often better to treat this as an error fit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) |> manageWarnings(warning2error = "the optimizer warns that a solution has NOT been found!") ## End(Not run)
The mean-square relative standard error (MSRSE) compares standard error estimates to the standard deviation of the respective parameter estimates. Values close to 1 indicate that the behavior of the standard errors closely matched the sampling variability of the parameter estimates.
MSRSE(SE, SD, percent = FALSE, unname = FALSE)
MSRSE(SE, SD, percent = FALSE, unname = FALSE)
SE |
a |
SD |
a |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
Mean-square relative standard error (MSRSE) is expressed as
where represents the estimate of the standard error at the
th
simulation replication, and
represents the standard deviation estimate
of the parameters across all
replications. Note that
is used,
which corresponds to the variance of
.
returns a vector
of ratios indicating the relative performance
of the standard error estimates to the observed parameter standard deviation.
Values less than 1 indicate that the standard errors were larger than the standard
deviation of the parameters (hence, the SEs are interpreted as more conservative),
while values greater than 1 were smaller than the standard deviation of the
parameters (i.e., more liberal SEs)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Generate <- function(condition, fixed_objects) { X <- rep(0:1, each = 50) y <- 10 + 5 * X + rnorm(100, 0, .2) data.frame(y, X) } Analyse <- function(condition, dat, fixed_objects) { mod <- lm(y ~ X, dat) so <- summary(mod) ret <- c(SE = so$coefficients[,"Std. Error"], est = so$coefficients[,"Estimate"]) ret } Summarise <- function(condition, results, fixed_objects) { MSRSE(SE = results[,1:2], SD = results[,3:4]) } results <- runSimulation(replications=500, generate=Generate, analyse=Analyse, summarise=Summarise) results
Generate <- function(condition, fixed_objects) { X <- rep(0:1, each = 50) y <- 10 + 5 * X + rnorm(100, 0, .2) data.frame(y, X) } Analyse <- function(condition, dat, fixed_objects) { mod <- lm(y ~ X, dat) so <- summary(mod) ret <- c(SE = so$coefficients[,"Std. Error"], est = so$coefficients[,"Estimate"]) ret } Summarise <- function(condition, results, fixed_objects) { MSRSE(SE = results[,1:2], SD = results[,3:4]) } results <- runSimulation(replications=500, generate=Generate, analyse=Analyse, summarise=Summarise) results
This is a wrapper to the function c
, however names the respective elements
according to their input object name. For this reason, nesting nc()
calls
is not recommended (joining independent nc()
calls via c()
is however reasonable).
nc(..., use.names = FALSE, error.on.duplicate = TRUE)
nc(..., use.names = FALSE, error.on.duplicate = TRUE)
... |
objects to be concatenated |
use.names |
logical indicating if |
error.on.duplicate |
logical; if the same object name appears in the returning object
should an error be thrown? Default is |
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
A <- 1 B <- 2 C <- 3 names(C) <- 'LetterC' # compare the following c(A, B, C) # unnamed nc(A, B, C) # named nc(this=A, B, C) # respects override named (same as c() ) nc(this=A, B, C, use.names = TRUE) # preserve original name ## Not run: # throws errors if names not unique nc(this=A, this=B, C) nc(LetterC=A, B, C, use.names=TRUE) ## End(Not run) # poor input choice names nc(t.test(c(1:2))$p.value, t.test(c(3:4))$p.value) # better to explicitly provide name nc(T1 = t.test(c(1:2))$p.value, T2 = t.test(c(3:4))$p.value) # vector of unnamed inputs A <- c(5,4,3,2,1) B <- c(100, 200) nc(A, B, C) # A's and B's numbered uniquely c(A, B, C) # compare nc(beta=A, B, C) # replacement of object name # retain names attributes (but append object name, when appropriate) names(A) <- letters[1:5] nc(A, B, C) nc(beta=A, B, C) nc(A, B, C, use.names=TRUE) # mix and match if some named elements work while others do not c( nc(A, B, use.names=TRUE), nc(C)) ## Not run: # error, 'b' appears twice names(B) <- c('b', 'b2') nc(A, B, C, use.names=TRUE) ## End(Not run) # List input A <- list(1) B <- list(2:3) C <- list('C') names(C) <- 'LetterC' # compare the following c(A, B, C) # unnamed nc(A, B, C) # named nc(this=A, B, C) # respects override named (same as c() and list() ) nc(this=A, B, C, use.names = TRUE) # preserve original name
A <- 1 B <- 2 C <- 3 names(C) <- 'LetterC' # compare the following c(A, B, C) # unnamed nc(A, B, C) # named nc(this=A, B, C) # respects override named (same as c() ) nc(this=A, B, C, use.names = TRUE) # preserve original name ## Not run: # throws errors if names not unique nc(this=A, this=B, C) nc(LetterC=A, B, C, use.names=TRUE) ## End(Not run) # poor input choice names nc(t.test(c(1:2))$p.value, t.test(c(3:4))$p.value) # better to explicitly provide name nc(T1 = t.test(c(1:2))$p.value, T2 = t.test(c(3:4))$p.value) # vector of unnamed inputs A <- c(5,4,3,2,1) B <- c(100, 200) nc(A, B, C) # A's and B's numbered uniquely c(A, B, C) # compare nc(beta=A, B, C) # replacement of object name # retain names attributes (but append object name, when appropriate) names(A) <- letters[1:5] nc(A, B, C) nc(beta=A, B, C) nc(A, B, C, use.names=TRUE) # mix and match if some named elements work while others do not c( nc(A, B, use.names=TRUE), nc(C)) ## Not run: # error, 'b' appears twice names(B) <- c('b', 'b2') nc(A, B, C, use.names=TRUE) ## End(Not run) # List input A <- list(1) B <- list(2:3) C <- list('C') names(C) <- 'LetterC' # compare the following c(A, B, C) # unnamed nc(A, B, C) # named nc(this=A, B, C) # respects override named (same as c() and list() ) nc(this=A, B, C, use.names = TRUE) # preserve original name
The function PBA
searches a specified interval
for a root
(i.e., zero) of the function f(x)
with respect to its first argument.
However, this function differs from deterministic cousins such as
uniroot
in that f
may contain stochastic error
components, and instead provides a Bayesian interval where the root
is likely to lie. Note that it is assumed that E[f(x)]
is non-decreasing
in x
and that the root is between the search interval (evaluated
approximately when check.interval=TRUE
).
See Waeber, Frazier, and Henderson (2013) for details.
PBA( f, interval, ..., p = 0.6, integer = FALSE, tol = if (integer) 0.01 else 1e-04, maxiter = 300L, miniter = 100L, wait.time = NULL, f.prior = NULL, resolution = 10000L, check.interval = TRUE, check.interval.only = FALSE, verbose = TRUE ) ## S3 method for class 'PBA' print(x, ...) ## S3 method for class 'PBA' plot(x, type = "posterior", main = "Probabilistic Bisection Posterior", ...)
PBA( f, interval, ..., p = 0.6, integer = FALSE, tol = if (integer) 0.01 else 1e-04, maxiter = 300L, miniter = 100L, wait.time = NULL, f.prior = NULL, resolution = 10000L, check.interval = TRUE, check.interval.only = FALSE, verbose = TRUE ) ## S3 method for class 'PBA' print(x, ...) ## S3 method for class 'PBA' plot(x, type = "posterior", main = "Probabilistic Bisection Posterior", ...)
f |
noisy function for which the root is sought |
interval |
a vector containing the end-points of the interval to be searched for the root |
... |
additional named arguments to be passed to |
p |
assumed constant for probability of correct responses (must be > 0.5) |
integer |
logical; should the values of the root be considered integer
or numeric? The former uses a discreet grid to track the updates, while the
latter currently creates a grid with |
tol |
tolerance criteria for convergence based on average of the
|
maxiter |
the maximum number of iterations (default 300) |
miniter |
minimum number of iterations (default 100) |
wait.time |
(optional) instead of terminating after specific estimate criteria
are satisfied (e.g., |
f.prior |
density function indicating the likely location of the prior
(e.g., if root is within [0,1] then |
resolution |
constant indicating the
number of equally spaced grid points to track when |
check.interval |
logical; should an initial check be made to determine
whether |
check.interval.only |
logical; return only TRUE or FALSE to test
whether there is a likely root given |
verbose |
logical; should the iterations and estimate be printed to the console? |
x |
an object of class |
type |
type of plot to draw for PBA object. Can be either 'posterior' or 'history' to plot the PBA posterior distribution or the mediation iteration history |
main |
plot title |
Horstein, M. (1963). Sequential transmission using noiseless feedback. IEEE Trans. Inform. Theory, 9(3):136-143.
Waeber, R., Frazier, P. I. & Henderson, S. G. (2013). Bisection Search with Noisy Responses. SIAM Journal on Control and Optimization, Society for Industrial & Applied Mathematics (SIAM), 51, 2261-2279.
# find x that solves f(x) - b = 0 for the following f.root <- function(x, b = .6) 1 / (1 + exp(-x)) - b f.root(.3) xs <- seq(-3,3, length.out=1000) plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x', las=1) abline(h=0, col='red') retuni <- uniroot(f.root, c(0,1)) retuni abline(v=retuni$root, col='blue', lty=2) # PBA without noisy root retpba <- PBA(f.root, c(0,1)) retpba retpba$root plot(retpba) plot(retpba, type = 'history') # Same problem, however root function is now noisy. Hence, need to solve # fhat(x) - b + e = 0, where E(e) = 0 f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02) sapply(rep(.3, 10), f.root_noisy) # uniroot "converges" unreliably set.seed(123) uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root # probabilistic bisection provides better convergence retpba.noise <- PBA(f.root_noisy, c(0,1)) retpba.noise plot(retpba.noise) plot(retpba.noise, type = 'history') ## Not run: # ignore termination criteria and instead run for 30 seconds or 30000 iterations retpba.noise_30sec <- PBA(f.root_noisy, c(0,1), wait.time = "0:30", maxiter=30000) retpba.noise_30sec ## End(Not run)
# find x that solves f(x) - b = 0 for the following f.root <- function(x, b = .6) 1 / (1 + exp(-x)) - b f.root(.3) xs <- seq(-3,3, length.out=1000) plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x', las=1) abline(h=0, col='red') retuni <- uniroot(f.root, c(0,1)) retuni abline(v=retuni$root, col='blue', lty=2) # PBA without noisy root retpba <- PBA(f.root, c(0,1)) retpba retpba$root plot(retpba) plot(retpba, type = 'history') # Same problem, however root function is now noisy. Hence, need to solve # fhat(x) - b + e = 0, where E(e) = 0 f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02) sapply(rep(.3, 10), f.root_noisy) # uniroot "converges" unreliably set.seed(123) uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root # probabilistic bisection provides better convergence retpba.noise <- PBA(f.root_noisy, c(0,1)) retpba.noise plot(retpba.noise) plot(retpba.noise, type = 'history') ## Not run: # ignore termination criteria and instead run for 30 seconds or 30000 iterations retpba.noise_30sec <- PBA(f.root_noisy, c(0,1), wait.time = "0:30", maxiter=30000) retpba.noise_30sec ## End(Not run)
This function is used to suppress information printed from external functions
that make internal use of message
and cat
, which
provide information in interactive R sessions. For simulations, the session
is not interactive, and therefore this type of output should be suppressed.
For similar behaviour for suppressing warning messages, see
manageWarnings
.
quiet(..., cat = TRUE, keep = FALSE, attr.name = "quiet.messages")
quiet(..., cat = TRUE, keep = FALSE, attr.name = "quiet.messages")
... |
the functional expression to be evaluated |
cat |
logical; also capture calls from |
keep |
logical; return a character vector of the messages/concatenate
and print strings as an attribute to the resulting object from |
attr.name |
attribute name to use when |
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different....') cat("...times!\n") if(warn) warning('It may even throw warnings!') x } out <- myfun(1) out # tell the function to shhhh out <- quiet(myfun(1)) out # which messages are suppressed? Extract stored attribute out <- quiet(myfun(1), keep = TRUE) attr(out, 'quiet.messages') # Warning messages still get through (see manageWarnings(suppress) # for better alternative than using suppressWarnings()) out2 <- myfun(2, warn=TRUE) |> quiet() # warning gets through out2 # suppress warning message explicitly, allowing others to be raised if present myfun(2, warn=TRUE) |> quiet() |> manageWarnings(suppress='It may even throw warnings!')
myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different....') cat("...times!\n") if(warn) warning('It may even throw warnings!') x } out <- myfun(1) out # tell the function to shhhh out <- quiet(myfun(1)) out # which messages are suppressed? Extract stored attribute out <- quiet(myfun(1), keep = TRUE) attr(out, 'quiet.messages') # Warning messages still get through (see manageWarnings(suppress) # for better alternative than using suppressWarnings()) out2 <- myfun(2, warn=TRUE) |> quiet() # warning gets through out2 # suppress warning message explicitly, allowing others to be raised if present myfun(2, warn=TRUE) |> quiet() |> manageWarnings(suppress='It may even throw warnings!')
Computes the relative absolute bias given the bias estimates for multiple estimators.
RAB(x, percent = FALSE, unname = FALSE)
RAB(x, percent = FALSE, unname = FALSE)
x |
a |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
returns a vector
of absolute bias ratios indicating the relative bias
effects compared to the first estimator. Values less than 1 indicate better bias estimates
than the first estimator, while values greater than 1 indicate worse bias than the first estimator
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
pop <- 1 samp1 <- rnorm(5000, 1) bias1 <- bias(samp1, pop) samp2 <- rnorm(5000, 1) bias2 <- bias(samp2, pop) RAB(c(bias1, bias2)) RAB(c(bias1, bias2), percent = TRUE) # as a percentage
pop <- 1 samp1 <- rnorm(5000, 1) bias1 <- bias(samp1, pop) samp2 <- rnorm(5000, 1) bias2 <- bias(samp2, pop) RAB(c(bias1, bias2)) RAB(c(bias1, bias2), percent = TRUE) # as a percentage
This function combines two Monte Carlo simulations executed by
SimDesign
's runSimulation
function which, for all
intents and purposes, could have been executed in a single run.
This situation arises when a simulation has been completed, however
the Design
object was later modified to include more levels in the
defined simulation factors. Rather than re-executing the previously completed
simulation combinations, only the new combinations need to be evaluated
into a different object and then rbind
together to create the complete
object combinations.
## S3 method for class 'SimDesign' rbind(...)
## S3 method for class 'SimDesign' rbind(...)
... |
two or more |
same object that is returned by runSimulation
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: # modified example from runSimulation() Design <- createDesign(N = c(10, 20), SD = c(1, 2)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, sd=SD)) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- mean(dat) # mean of the sample data vector ret } Summarise <- function(condition, results, fixed_objects) { ret <- c(mu=mean(results), SE=sd(results)) # mean and SD summary of the sample means ret } Final1 <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final1 ### # later decide that N = 30 should have also been investigated. Rather than # running the following object .... newDesign <- createDesign(N = c(10, 20, 30), SD = c(1, 2)) # ... only the new subset levels are executed to save time subDesign <- subset(newDesign, N == 30) subDesign Final2 <- runSimulation(design=subDesign, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final2 # glue results together by row into one object as though the complete 'Design' # object were run all at once Final <- rbind(Final1, Final2) Final summary(Final) ## End(Not run)
## Not run: # modified example from runSimulation() Design <- createDesign(N = c(10, 20), SD = c(1, 2)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, sd=SD)) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- mean(dat) # mean of the sample data vector ret } Summarise <- function(condition, results, fixed_objects) { ret <- c(mu=mean(results), SE=sd(results)) # mean and SD summary of the sample means ret } Final1 <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final1 ### # later decide that N = 30 should have also been investigated. Rather than # running the following object .... newDesign <- createDesign(N = c(10, 20, 30), SD = c(1, 2)) # ... only the new subset levels are executed to save time subDesign <- subset(newDesign, N == 30) subDesign Final2 <- runSimulation(design=subDesign, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final2 # glue results together by row into one object as though the complete 'Design' # object were run all at once Final <- rbind(Final1, Final2) Final summary(Final) ## End(Not run)
Computes the relative difference statistic of the form (est - pop)/ pop
, which
is equivalent to the form est/pop - 1
. If matrices are supplied then
an equivalent matrix variant will be used of the form
(est - pop) * solve(pop)
. Values closer to 0 indicate better
relative parameter recovery. Note that for single variable inputs this is equivalent to
bias(..., type = 'relative')
.
RD(est, pop, as.vector = TRUE, unname = FALSE)
RD(est, pop, as.vector = TRUE, unname = FALSE)
est |
a |
pop |
a |
as.vector |
logical; always wrap the result in a |
unname |
logical; apply |
returns a vector
or matrix
depending on the inputs and whether
as.vector
was used
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# vector pop <- seq(1, 100, length.out=9) est1 <- pop + rnorm(9, 0, .2) (rds <- RD(est1, pop)) summary(rds) # matrix pop <- matrix(c(1:8, 10), 3, 3) est2 <- pop + rnorm(9, 0, .2) RD(est2, pop, as.vector = FALSE) (rds <- RD(est2, pop)) summary(rds)
# vector pop <- seq(1, 100, length.out=9) est1 <- pop + rnorm(9, 0, .2) (rds <- RD(est1, pop)) summary(rds) # matrix pop <- matrix(c(1:8, 10), 3, 3) est2 <- pop + rnorm(9, 0, .2) RD(est2, pop, as.vector = FALSE) (rds <- RD(est2, pop)) summary(rds)
Computes the relative efficiency given the RMSE (default) or MSE values for multiple estimators.
RE(x, MSE = FALSE, percent = FALSE, unname = FALSE)
RE(x, MSE = FALSE, percent = FALSE, unname = FALSE)
x |
a |
MSE |
logical; are the input value mean squared errors instead of root mean square errors? |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
returns a vector
of variance ratios indicating the relative efficiency compared
to the first estimator. Values less than 1 indicate better efficiency than the first
estimator, while values greater than 1 indicate worse efficiency than the first estimator
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
pop <- 1 samp1 <- rnorm(100, 1, sd = 0.5) RMSE1 <- RMSE(samp1, pop) samp2 <- rnorm(100, 1, sd = 1) RMSE2 <- RMSE(samp2, pop) RE(c(RMSE1, RMSE2)) RE(c(RMSE1, RMSE2), percent = TRUE) # as a percentage # using MSE instead mse <- c(RMSE1, RMSE2)^2 RE(mse, MSE = TRUE)
pop <- 1 samp1 <- rnorm(100, 1, sd = 0.5) RMSE1 <- RMSE(samp1, pop) samp2 <- rnorm(100, 1, sd = 1) RMSE2 <- RMSE(samp2, pop) RE(c(RMSE1, RMSE2)) RE(c(RMSE1, RMSE2), percent = TRUE) # as a percentage # using MSE instead mse <- c(RMSE1, RMSE2)^2 RE(mse, MSE = TRUE)
This function supports the rejection sampling (i.e., accept-reject) approach to drawing values from seemingly difficult (probability) density functions by sampling values from more manageable proxy distributions.
rejectionSampling( n, df, dg, rg, M, method = "optimize", interval = NULL, logfuns = FALSE, maxM = 1e+05, parstart = rg(1L), ESRS_Mstart = 1.0001 )
rejectionSampling( n, df, dg, rg, M, method = "optimize", interval = NULL, logfuns = FALSE, maxM = 1e+05, parstart = rg(1L), ESRS_Mstart = 1.0001 )
n |
number of samples to draw |
df |
the desired (potentially un-normed) density function to draw
independent samples from. Must be in the form of a |
dg |
the proxy (potentially un-normed) density function to
draw samples from in lieu of drawing samples from |
rg |
the proxy random number generation function, associated with
|
M |
the upper-bound of the ratio of probability density functions to help
minimize the number of discarded draws and define the corresponding
rescaled proposal envelope. When missing, |
method |
when M is missing, the optimization of M is done either by
finding the mode of the log-density values ( |
interval |
when M is missing, for univariate density function draws,
the interval to search within via |
logfuns |
logical; have the |
maxM |
logical; if when optimizing M the value is greater than this cut-off then stop; ampler would likelihood be too efficient, or optimization is failing |
parstart |
starting value vector for optimization of M in multidimensional distributions |
ESRS_Mstart |
starting M value for the ESRS algorithm |
The accept-reject algorithm is a flexible approach to obtaining i.i.d.'s from
a difficult to sample from (probability) density function where either the
transformation method fails or inverse transform method is
difficult to manage. The algorithm does so by sampling from
a more "well-behaved" proxy distribution (with identical support, up to some
proportionality constant M
that reshapes the proposal density
to envelope the target density), and accepts the
draws if they are likely within the target density. Hence, the closer the
shape of dg(x)
is to the desired df(x)
, the more likely the draws
are to be accepted; otherwise, many iterations of the accept-reject algorithm
may be required, which decreases the computational efficiency.
returns a vector or matrix of draws (corresponding to the
output class from rg
) from the desired df
Phil Chalmers [email protected]
Caffo, B. S., Booth, J. G., and Davison, A. C. (2002). Empirical supremum
rejection sampling. Biometrika
, 89, 745–754.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable
Monte Carlo Simulations with the SimDesign Package.
The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics
with Monte Carlo simulation. Journal of Statistics Education, 24
(3),
136-156. doi:10.1080/10691898.2016.1246953
## Not run: # Generate X ~ beta(a,b), where a and b are a = 2.7 and b = 6.3, # and the support is Y ~ Unif(0,1) dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3) dgn <- function(x) dunif(x, min = 0, max = 1) rgn <- function(n) runif(n, min = 0, max = 1) # when df and dg both integrate to 1, acceptance probability = 1/M M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn) M dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M) hist(dat, 100) hist(rbeta(10000, 2.7, 6.3), 100) # compare # obtain empirical estimate of M via ESRS method M <- rejectionSampling(1000, df=dfn, dg=dgn, rg=rgn, method='ESRS') M # generate using better support function (here, Y ~ beta(2,6)), # and use log setup in initial calls (more numerically accurate) dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3, log = TRUE) dgn <- function(x) dbeta(x, shape1 = 2, shape2 = 6, log = TRUE) rgn <- function(n) rbeta(n, shape1 = 2, shape2 = 6) M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) # better M M ## Alternative estimation of M ## M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, logfuns=TRUE, ## method='ESRS') dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M, logfuns=TRUE) hist(dat, 100) #------------------------------------------------------ # sample from wonky (and non-normalized) density function, like below dfn <- function(x){ ret <- numeric(length(x)) ret[x <= .5] <- dnorm(x[x <= .5]) ret[x > .5] <- dnorm(x[x > .5]) + dchisq(x[x > .5], df = 2) ret } y <- seq(-5,5, length.out = 1000) plot(y, dfn(y), type = 'l', main = "Function to sample from") # choose dg/rg functions that have support within the range [-inf, inf] rgn <- function(n) rnorm(n, sd=4) dgn <- function(x) dnorm(x, sd=4) ## example M height from above graphic ## (M selected using ESRS to help stochastically avoid local mins) M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, method='ESRS') M lines(y, dgn(y)*M, lty = 2) dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M) hist(dat, 100, prob=TRUE) # true density (normalized) C <- integrate(dfn, -Inf, Inf)$value ndfn <- function(x) dfn(x) / C curve(ndfn, col='red', lwd=2, add=TRUE) #----------------------------------------------------- # multivariate distribution dfn <- function(x) sum(log(c(dnorm(x[1]) + dchisq(x[1], df = 5), dnorm(x[2], -1, 2)))) rgn <- function(n) c(rnorm(n, sd=3), rnorm(n, sd=3)) dgn <- function(x) sum(log(c(dnorm(x[1], sd=3), dnorm(x[1], sd=3)))) # M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) dat <- rejectionSampling(5000, df=dfn, dg=dgn, rg=rgn, M=4.6, logfuns=TRUE) hist(dat[,1], 30) hist(dat[,2], 30) plot(dat) ## End(Not run)
## Not run: # Generate X ~ beta(a,b), where a and b are a = 2.7 and b = 6.3, # and the support is Y ~ Unif(0,1) dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3) dgn <- function(x) dunif(x, min = 0, max = 1) rgn <- function(n) runif(n, min = 0, max = 1) # when df and dg both integrate to 1, acceptance probability = 1/M M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn) M dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M) hist(dat, 100) hist(rbeta(10000, 2.7, 6.3), 100) # compare # obtain empirical estimate of M via ESRS method M <- rejectionSampling(1000, df=dfn, dg=dgn, rg=rgn, method='ESRS') M # generate using better support function (here, Y ~ beta(2,6)), # and use log setup in initial calls (more numerically accurate) dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3, log = TRUE) dgn <- function(x) dbeta(x, shape1 = 2, shape2 = 6, log = TRUE) rgn <- function(n) rbeta(n, shape1 = 2, shape2 = 6) M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) # better M M ## Alternative estimation of M ## M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, logfuns=TRUE, ## method='ESRS') dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M, logfuns=TRUE) hist(dat, 100) #------------------------------------------------------ # sample from wonky (and non-normalized) density function, like below dfn <- function(x){ ret <- numeric(length(x)) ret[x <= .5] <- dnorm(x[x <= .5]) ret[x > .5] <- dnorm(x[x > .5]) + dchisq(x[x > .5], df = 2) ret } y <- seq(-5,5, length.out = 1000) plot(y, dfn(y), type = 'l', main = "Function to sample from") # choose dg/rg functions that have support within the range [-inf, inf] rgn <- function(n) rnorm(n, sd=4) dgn <- function(x) dnorm(x, sd=4) ## example M height from above graphic ## (M selected using ESRS to help stochastically avoid local mins) M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, method='ESRS') M lines(y, dgn(y)*M, lty = 2) dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M) hist(dat, 100, prob=TRUE) # true density (normalized) C <- integrate(dfn, -Inf, Inf)$value ndfn <- function(x) dfn(x) / C curve(ndfn, col='red', lwd=2, add=TRUE) #----------------------------------------------------- # multivariate distribution dfn <- function(x) sum(log(c(dnorm(x[1]) + dchisq(x[1], df = 5), dnorm(x[2], -1, 2)))) rgn <- function(n) c(rnorm(n, sd=3), rnorm(n, sd=3)) dgn <- function(x) sum(log(c(dnorm(x[1], sd=3), dnorm(x[1], sd=3)))) # M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) dat <- rejectionSampling(5000, df=dfn, dg=dgn, rg=rgn, M=4.6, logfuns=TRUE) hist(dat[,1], 30) hist(dat[,2], 30) plot(dat) ## End(Not run)
When runSimulation()
uses the option save_results = TRUE
the R replication results from the Generate-Analyse functions are
stored to the hard drive. As such, additional summarise components
may be required at a later time, whereby the respective .rds
files
must be read back into R to be summarised. This function performs
the reading of these files, application of a provided summarise function,
and final collection of the respective results.
reSummarise( summarise, dir = NULL, files = NULL, results = NULL, Design = NULL, fixed_objects = NULL, boot_method = "none", boot_draws = 1000L, CI = 0.95, prefix = "results-row" )
reSummarise( summarise, dir = NULL, files = NULL, results = NULL, Design = NULL, fixed_objects = NULL, boot_method = "none", boot_draws = 1000L, CI = 0.95, prefix = "results-row" )
summarise |
a summarise function to apply to the read-in files.
See |
dir |
directory pointing to the .rds files to be
read-in that were saved from |
files |
(optional) names of files to read-in. If |
results |
(optional) the results of Alternatively, if |
Design |
(optional) if |
fixed_objects |
(optional) see |
boot_method |
method for performing non-parametric bootstrap confidence intervals
for the respective meta-statistics computed by the |
boot_draws |
number of non-parametric bootstrap draws to sample for the |
CI |
bootstrap confidence interval level (default is 95%) |
prefix |
character indicating prefix used for stored files |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Design <- createDesign(N = c(10, 20, 30)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret } Summarise <- function(condition, results, fixed_objects){ colMeans(results) } ## Not run: # run the simulation runSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, save_results=TRUE, save_details = list(save_results_dirname='simresults')) res <- reSummarise(Summarise, dir = 'simresults/') res Summarise2 <- function(condition, results, fixed_objects){ ret <- c(mean_ests=colMeans(results), SE=colSDs(results)) ret } res2 <- reSummarise(Summarise2, dir = 'simresults/') res2 SimClean(dir='simresults/') ## End(Not run) ### # Similar, but with results stored within the final object res <- runSimulation(design=Design, replications=50, store_results = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) res # same summarise but with bootstrapping res2 <- reSummarise(Summarise, results = res, boot_method = 'basic') res2
Design <- createDesign(N = c(10, 20, 30)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret } Summarise <- function(condition, results, fixed_objects){ colMeans(results) } ## Not run: # run the simulation runSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, save_results=TRUE, save_details = list(save_results_dirname='simresults')) res <- reSummarise(Summarise, dir = 'simresults/') res Summarise2 <- function(condition, results, fixed_objects){ ret <- c(mean_ests=colMeans(results), SE=colSDs(results)) ret } res2 <- reSummarise(Summarise2, dir = 'simresults/') res2 SimClean(dir='simresults/') ## End(Not run) ### # Similar, but with results stored within the final object res <- runSimulation(design=Design, replications=50, store_results = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) res # same summarise but with bootstrapping res2 <- reSummarise(Summarise, results = res, boot_method = 'basic') res2
Generate multivariate non-normal distributions using the fifth-order polynomial method described by Headrick (2002).
rHeadrick( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)), gam3 = NaN, gam4 = NaN, return_coefs = FALSE, coefs = NULL, control = list(trace = FALSE, max.ntry = 15, obj.tol = 1e-10, n.valid.sol = 1) )
rHeadrick( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)), gam3 = NaN, gam4 = NaN, return_coefs = FALSE, coefs = NULL, control = list(trace = FALSE, max.ntry = 15, obj.tol = 1e-10, n.valid.sol = 1) )
n |
number of samples to draw |
mean |
a vector of k elements for the mean of the variables |
sigma |
desired k x k covariance matrix between bivariate non-normal variables |
skew |
a vector of k elements for the skewness of the variables |
kurt |
a vector of k elements for the kurtosis of the variables |
gam3 |
(optional) explicitly supply the gamma 3 value? Default computes this internally |
gam4 |
(optional) explicitly supply the gamma 4 value? Default computes this internally |
return_coefs |
logical; return the estimated coefficients only? See below regarding why this is useful. |
coefs |
(optional) supply previously estimated coefficients? This is useful when there must be multiple
data sets drawn and will avoid repetitive computations. Must be the object returned after passing
|
control |
a list of control parameters when locating the polynomial coefficients |
This function is primarily a wrapper for the code written by Oscar L. Olvera Astivia (last edited Feb 26, 2015) with some modifications (e.g., better starting values for the Newton optimizer, passing previously saved coefs, etc).
Oscar L. Olvera Astivia and Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Headrick, T. C. (2002). Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Computational Statistics & Data Analysis, 40, 685-711.
Olvera Astivia, O. L., & Zumbo, B. D. (2015). A Cautionary Note on the Use of the Vale and Maurelli Method to Generate Multivariate, Nonnormal Data for Simulation Purposes. Educational and Psychological Measurement, 75, 541-567.
## Not run: set.seed(1) N <- 200 mean <- c(rep(0,4)) Sigma <- matrix(.49, 4, 4) diag(Sigma) <- 1 skewness <- c(rep(1,4)) kurtosis <- c(rep(2,4)) nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis) # cor(nonnormal) # psych::describe(nonnormal) #----------- # compute the coefficients, then supply them back to the function to avoid # extra computations cfs <- rHeadrick(N, mean, Sigma, skewness, kurtosis, return_coefs = TRUE) cfs # compare system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis)) system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis, coefs=cfs)) ## End(Not run)
## Not run: set.seed(1) N <- 200 mean <- c(rep(0,4)) Sigma <- matrix(.49, 4, 4) diag(Sigma) <- 1 skewness <- c(rep(1,4)) kurtosis <- c(rep(2,4)) nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis) # cor(nonnormal) # psych::describe(nonnormal) #----------- # compute the coefficients, then supply them back to the function to avoid # extra computations cfs <- rHeadrick(N, mean, Sigma, skewness, kurtosis, return_coefs = TRUE) cfs # compare system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis)) system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis, coefs=cfs)) ## End(Not run)
Efficiently generate positive and negative integer values with (default) or without replacement.
This function is mainly a wrapper to the sample.int
function (which itself is much
more efficient integer sampler than the more general sample
), however is intended
to work with both positive and negative integer ranges since sample.int
only returns
positive integer values that must begin at 1L
.
rint(n, min, max, replace = TRUE, prob = NULL)
rint(n, min, max, replace = TRUE, prob = NULL)
n |
number of samples to draw |
min |
lower limit of the distribution. Must be finite |
max |
upper limit of the distribution. Must be finite |
replace |
should sampling be with replacement? |
prob |
a vector of probability weights for obtaining the elements of the vector being sampled |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
set.seed(1) # sample 1000 integer values within 20 to 100 x <- rint(1000, min = 20, max = 100) summary(x) # sample 1000 integer values within 100 to 10 billion x <- rint(1000, min = 100, max = 1e8) summary(x) # compare speed to sample() system.time(x <- rint(1000, min = 100, max = 1e8)) system.time(x2 <- sample(100:1e8, 1000, replace = TRUE)) # sample 1000 integer values within -20 to 20 x <- rint(1000, min = -20, max = 20) summary(x)
set.seed(1) # sample 1000 integer values within 20 to 100 x <- rint(1000, min = 20, max = 100) summary(x) # sample 1000 integer values within 100 to 10 billion x <- rint(1000, min = 100, max = 1e8) summary(x) # compare speed to sample() system.time(x <- rint(1000, min = 100, max = 1e8)) system.time(x2 <- sample(100:1e8, 1000, replace = TRUE)) # sample 1000 integer values within -20 to 20 x <- rint(1000, min = -20, max = 20) summary(x)
Function generates data in the form of symmetric matrices from the inverse Wishart distribution given a covariance matrix and degrees of freedom.
rinvWishart(n = 1, df, sigma)
rinvWishart(n = 1, df, sigma)
n |
number of matrix observations to generate. By default |
df |
degrees of freedom |
sigma |
positive definite covariance matrix |
a numeric matrix with columns equal to ncol(sigma)
when n = 1
, or a list
of n
matrices with the same properties
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# random inverse Wishart matrix given variances [3,6], covariance 2, and df=15 sigma <- matrix(c(3,2,2,6), 2, 2) x <- rinvWishart(sigma = sigma, df = 15) x # list of matrices x <- rinvWishart(20, sigma = sigma, df = 15) x
# random inverse Wishart matrix given variances [3,6], covariance 2, and df=15 sigma <- matrix(c(3,2,2,6), 2, 2) x <- rinvWishart(sigma = sigma, df = 15) x # list of matrices x <- rinvWishart(20, sigma = sigma, df = 15) x
Generate non-normal distributions using the multivariate g-and-h distribution. Can be used to generate several different classes of univariate and multivariate distributions.
rmgh(n, g, h, mean = rep(0, length(g)), sigma = diag(length(mean)))
rmgh(n, g, h, mean = rep(0, length(g)), sigma = diag(length(mean)))
n |
number of samples to draw |
g |
the g parameter(s) which control the skew of a distribution in terms of both direction and magnitude |
h |
the h parameter(s) which control the tail weight or elongation of a distribution and is positively related with kurtosis |
mean |
a vector of k elements for the mean of the variables |
sigma |
desired k x k covariance matrix between bivariate non-normal variables |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
set.seed(1) # univariate norm <- rmgh(10000,1e-5,0) hist(norm) skew <- rmgh(10000,1/2,0) hist(skew) neg_skew_platykurtic <- rmgh(10000,-1,-1/2) hist(neg_skew_platykurtic) # multivariate sigma <- matrix(c(2,1,1,4), 2) mean <- c(-1, 1) twovar <- rmgh(10000, c(-1/2, 1/2), c(0,0), mean=mean, sigma=sigma) hist(twovar[,1]) hist(twovar[,2]) plot(twovar)
set.seed(1) # univariate norm <- rmgh(10000,1e-5,0) hist(norm) skew <- rmgh(10000,1/2,0) hist(skew) neg_skew_platykurtic <- rmgh(10000,-1,-1/2) hist(neg_skew_platykurtic) # multivariate sigma <- matrix(c(2,1,1,4), 2) mean <- c(-1, 1) twovar <- rmgh(10000, c(-1/2, 1/2), c(0,0), mean=mean, sigma=sigma) hist(twovar[,1]) hist(twovar[,2]) plot(twovar)
Computes the average deviation (root mean square error; also known as the root mean square deviation) of a sample estimate from the parameter value. Accepts estimate and parameter values, as well as estimate values which are in deviation form.
RMSE( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE ) RMSD( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE )
RMSE( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE ) RMSD( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE )
estimate |
a |
parameter |
a |
type |
type of deviation to compute. Can be |
MSE |
logical; return the mean square error equivalent of the results instead of the root
mean-square error (in other words, the result is squared)? Default is |
percent |
logical; change returned result to percentage by multiplying by 100? Default is FALSE |
unname |
logical; apply |
returns a numeric
vector indicating the overall average deviation in the estimates
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
MAE
pop <- 1 samp <- rnorm(100, 1, sd = 0.5) RMSE(samp, pop) dev <- samp - pop RMSE(dev) RMSE(samp, pop, type = 'NRMSE') RMSE(dev, type = 'NRMSE') RMSE(dev, pop, type = 'SRMSE') RMSE(samp, pop, type = 'CV') RMSE(samp, pop, type = 'RMSLE') # percentage reported RMSE(samp, pop, type = 'NRMSE') RMSE(samp, pop, type = 'NRMSE', percent = TRUE) # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) RMSE(mat, parameter = 2) RMSE(mat, parameter = c(2, 3)) # different parameter associated with each column mat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25)) RMSE(mat, parameter = c(2,3)) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) RMSE(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) RMSE(estimates, parameters)
pop <- 1 samp <- rnorm(100, 1, sd = 0.5) RMSE(samp, pop) dev <- samp - pop RMSE(dev) RMSE(samp, pop, type = 'NRMSE') RMSE(dev, type = 'NRMSE') RMSE(dev, pop, type = 'SRMSE') RMSE(samp, pop, type = 'CV') RMSE(samp, pop, type = 'RMSLE') # percentage reported RMSE(samp, pop, type = 'NRMSE') RMSE(samp, pop, type = 'NRMSE', percent = TRUE) # matrix input mat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) RMSE(mat, parameter = 2) RMSE(mat, parameter = c(2, 3)) # different parameter associated with each column mat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25)) RMSE(mat, parameter = c(2,3)) # same, but with data.frame df <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1)) RMSE(df, parameter = c(2,2)) # parameters of the same size parameters <- 1:10 estimates <- parameters + rnorm(10) RMSE(estimates, parameters)
Function generates data from the multivariate normal distribution given some mean vector and/or covariance matrix.
rmvnorm(n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))
rmvnorm(n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))
n |
number of observations to generate |
mean |
mean vector, default is |
sigma |
positive definite covariance matrix, default is |
a numeric matrix with columns equal to length(mean)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# random normal values with mean [5, 10] and variances [3,6], and covariance 2 sigma <- matrix(c(3,2,2,6), 2, 2) mu <- c(5,10) x <- rmvnorm(1000, mean = mu, sigma = sigma) head(x) summary(x) plot(x[,1], x[,2])
# random normal values with mean [5, 10] and variances [3,6], and covariance 2 sigma <- matrix(c(3,2,2,6), 2, 2) mu <- c(5,10) x <- rmvnorm(1000, mean = mu, sigma = sigma) head(x) summary(x) plot(x[,1], x[,2])
Function generates data from the multivariate t distribution given a covariance matrix, non-centrality parameter (or mode), and degrees of freedom.
rmvt(n, sigma, df, delta = rep(0, nrow(sigma)), Kshirsagar = FALSE)
rmvt(n, sigma, df, delta = rep(0, nrow(sigma)), Kshirsagar = FALSE)
n |
number of observations to generate |
sigma |
positive definite covariance matrix |
df |
degrees of freedom. |
delta |
the vector of non-centrality parameters of length |
Kshirsagar |
logical; triggers whether to generate data with non-centrality parameters or to adjust the simulated data to the mode of the distribution. The default uses the mode |
a numeric matrix with columns equal to ncol(sigma)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# random t values given variances [3,6], covariance 2, and df = 15 sigma <- matrix(c(3,2,2,6), 2, 2) x <- rmvt(1000, sigma = sigma, df = 15) head(x) summary(x) plot(x[,1], x[,2])
# random t values given variances [3,6], covariance 2, and df = 15 sigma <- matrix(c(3,2,2,6), 2, 2) x <- rmvt(1000, sigma = sigma, df = 15) head(x) summary(x) plot(x[,1], x[,2])
Function performs stochastic root solving for the provided f(x)
using the Robbins-Monro (1951) algorithm. Differs from deterministic
cousins such as uniroot
in that f
may contain stochastic error
components, where the root is obtained through the running average method
provided by noise filter (see also PBA
).
Assumes that E[f(x)]
is non-decreasing in x
.
RobbinsMonro( f, p, ..., Polyak_Juditsky = FALSE, maxiter = 500L, miniter = 100L, k = 3L, tol = 1e-05, verbose = TRUE, fn.a = function(iter, a = 1, b = 1/2, c = 0, ...) a/(iter + c)^b ) ## S3 method for class 'RM' print(x, ...) ## S3 method for class 'RM' plot(x, par = 1, main = NULL, Polyak_Juditsky = FALSE, ...)
RobbinsMonro( f, p, ..., Polyak_Juditsky = FALSE, maxiter = 500L, miniter = 100L, k = 3L, tol = 1e-05, verbose = TRUE, fn.a = function(iter, a = 1, b = 1/2, c = 0, ...) a/(iter + c)^b ) ## S3 method for class 'RM' print(x, ...) ## S3 method for class 'RM' plot(x, par = 1, main = NULL, Polyak_Juditsky = FALSE, ...)
f |
noisy function for which the root is sought |
p |
vector of starting values to be passed as |
... |
additional named arguments to be passed to |
Polyak_Juditsky |
logical; apply the Polyak and Juditsky (1992)
running-average method? Returns the final running average estimate
using the Robbins-Monro updates (also applies to |
maxiter |
the maximum number of iterations (default 500) |
miniter |
minimum number of iterations (default 100) |
k |
number of consecutive |
tol |
tolerance criteria for convergence on the changes in the
updated |
verbose |
logical; should the iterations and estimate be printed to the console? |
fn.a |
function to create the Note that if a different function is provided it must satisfy the property
that |
x |
an object of class |
par |
which parameter in the original vector |
main |
plot title |
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of Stochastic Approximation by Averaging. SIAM Journal on Control and Optimization, 30(4):838.
Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann.Math.Statistics, 22:400-407.
Spall, J.C. (2000). Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45, 1839-1853.
# find x that solves f(x) - b = 0 for the following f.root <- function(x, b = .6) 1 / (1 + exp(-x)) - b f.root(.3) xs <- seq(-3,3, length.out=1000) plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x') abline(h=0, col='red') retuni <- uniroot(f.root, c(0,1)) retuni abline(v=retuni$root, col='blue', lty=2) # Robbins-Monro without noisy root, start with p=.9 retrm <- RobbinsMonro(f.root, .9) retrm plot(retrm) # Same problem, however root function is now noisy. Hence, need to solve # fhat(x) - b + e = 0, where E(e) = 0 f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02) sapply(rep(.3, 10), f.root_noisy) # uniroot "converges" unreliably set.seed(123) uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root # Robbins-Monro provides better convergence retrm.noise <- RobbinsMonro(f.root_noisy, .9) retrm.noise plot(retrm.noise) # different power (b) for fn.a() retrm.b2 <- RobbinsMonro(f.root_noisy, .9, b = .01) retrm.b2 plot(retrm.b2) # use Polyak-Juditsky averaging (b should be closer to 0 to work well) retrm.PJ <- RobbinsMonro(f.root_noisy, .9, b = .01, Polyak_Juditsky = TRUE) retrm.PJ # final Polyak_Juditsky estimate plot(retrm.PJ) # Robbins-Monro history plot(retrm.PJ, Polyak_Juditsky = TRUE) # Polyak_Juditsky history
# find x that solves f(x) - b = 0 for the following f.root <- function(x, b = .6) 1 / (1 + exp(-x)) - b f.root(.3) xs <- seq(-3,3, length.out=1000) plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x') abline(h=0, col='red') retuni <- uniroot(f.root, c(0,1)) retuni abline(v=retuni$root, col='blue', lty=2) # Robbins-Monro without noisy root, start with p=.9 retrm <- RobbinsMonro(f.root, .9) retrm plot(retrm) # Same problem, however root function is now noisy. Hence, need to solve # fhat(x) - b + e = 0, where E(e) = 0 f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02) sapply(rep(.3, 10), f.root_noisy) # uniroot "converges" unreliably set.seed(123) uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root uniroot(f.root_noisy, c(0,1))$root # Robbins-Monro provides better convergence retrm.noise <- RobbinsMonro(f.root_noisy, .9) retrm.noise plot(retrm.noise) # different power (b) for fn.a() retrm.b2 <- RobbinsMonro(f.root_noisy, .9, b = .01) retrm.b2 plot(retrm.b2) # use Polyak-Juditsky averaging (b should be closer to 0 to work well) retrm.PJ <- RobbinsMonro(f.root_noisy, .9, b = .01, Polyak_Juditsky = TRUE) retrm.PJ # final Polyak_Juditsky estimate plot(retrm.PJ) # Robbins-Monro history plot(retrm.PJ, Polyak_Juditsky = TRUE) # Polyak_Juditsky history
Computes the relative standard error ratio given the set of estimated standard errors (SE) and the deviation across the R simulation replications (SD). The ratio is formed by finding the expectation of the SE terms, and compares this expectation to the general variability of their respective parameter estimates across the R replications (ratio should equal 1). This is used to roughly evaluate whether the SEs being advertised by a given estimation method matches the sampling variability of the respective estimates across samples.
RSE(SE, ests, unname = FALSE)
RSE(SE, ests, unname = FALSE)
SE |
a |
ests |
a |
unname |
logical; apply |
returns vector of variance ratios, (RSV = SE^2/SD^2)
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
R <- 10000 par_ests <- cbind(rnorm(R), rnorm(R, sd=1/10), rnorm(R, sd=1/15)) colnames(par_ests) <- paste0("par", 1:3) (SDs <- colSDs(par_ests)) SEs <- cbind(1 + rnorm(R, sd=.01), 1/10 + + rnorm(R, sd=.01), 1/15 + rnorm(R, sd=.01)) (E_SEs <- colMeans(SEs)) RSE(SEs, par_ests) # equivalent to the form colMeans(SEs) / SDs
R <- 10000 par_ests <- cbind(rnorm(R), rnorm(R, sd=1/10), rnorm(R, sd=1/15)) colnames(par_ests) <- paste0("par", 1:3) (SDs <- colSDs(par_ests)) SEs <- cbind(1 + rnorm(R, sd=.01), 1/10 + + rnorm(R, sd=.01), 1/15 + rnorm(R, sd=.01)) (E_SEs <- colMeans(SEs)) RSE(SEs, par_ests) # equivalent to the form colMeans(SEs) / SDs
Function generates data given a supplied random number generating function that are constructed to fall within a particular range. Sampled values outside this range are discarded and re-sampled until the desired criteria has been met.
rtruncate(n, rfun, range, ..., redraws = 100L)
rtruncate(n, rfun, range, ..., redraws = 100L)
n |
number of observations to generate. This should be the first argument
passed to |
rfun |
a function to generate random values. Function can return
a numeric/integer vector or matrix, and additional arguments
requred for this function are passed through the argument |
range |
a numeric vector of length two, where the first element
indicates the lower bound and the second the upper bound. When values are
generated outside these two bounds then data are redrawn until the bounded
criteria is met. When the output of |
... |
additional arguments to be passed to |
redraws |
the maximum number of redraws to take before terminating the
iterative sequence. This is in place as a safety in case the |
In simulations it is often useful to draw numbers from truncated distributions
rather than across the full theoretical range. For instance, sampling parameters
within the range [-4,4] from a normal distribution. The rtruncate
function has been designed to accept any sampling function, where the first
argument is the number of values to sample, and will draw values iteratively
until the number of values within the specified bound are obtained.
In situations where it is unlikely for the bounds to be located
(e.g., sampling from a standard normal distribution where all values are
within [-10,-6]) then the sampling scheme will throw an error if too many
re-sampling executions are required (default will stop if more that 100
calls to rfun
are required).
either a numeric vector or matrix, where all values are within the
desired range
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable
Monte Carlo Simulations with the SimDesign Package.
The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics
with Monte Carlo simulation. Journal of Statistics Education, 24
(3),
136-156. doi:10.1080/10691898.2016.1246953
# n = 1000 truncated normal vector between [-2,3] vec <- rtruncate(1000, rnorm, c(-2,3)) summary(vec) # truncated correlated multivariate normal between [-1,4] mat <- rtruncate(1000, rmvnorm, c(-1,4), sigma = matrix(c(2,1,1,1),2)) summary(mat) # truncated correlated multivariate normal between [-1,4] for the # first column and [0,3] for the second column mat <- rtruncate(1000, rmvnorm, cbind(c(-1,4), c(0,3)), sigma = matrix(c(2,1,1,1),2)) summary(mat) # truncated chi-square with df = 4 between [2,6] vec <- rtruncate(1000, rchisq, c(2,6), df = 4) summary(vec)
# n = 1000 truncated normal vector between [-2,3] vec <- rtruncate(1000, rnorm, c(-2,3)) summary(vec) # truncated correlated multivariate normal between [-1,4] mat <- rtruncate(1000, rmvnorm, c(-1,4), sigma = matrix(c(2,1,1,1),2)) summary(mat) # truncated correlated multivariate normal between [-1,4] for the # first column and [0,3] for the second column mat <- rtruncate(1000, rmvnorm, cbind(c(-1,4), c(0,3)), sigma = matrix(c(2,1,1,1),2)) summary(mat) # truncated chi-square with df = 4 between [2,6] vec <- rtruncate(1000, rchisq, c(2,6), df = 4) summary(vec)
This function has the same purpose as runSimulation
, however
rather than evaluating each row in a design
object (potentially with
parallel computing architecture) this function evaluates the simulation
per independent row condition. This is mainly useful when distributing the
jobs to HPC clusters where a job array number is available (e.g., via SLURM),
where the simulation results must be saved to independent files as they
complete. Use of expandDesign
is useful for distributing replications
to different jobs, while genSeeds
is required to ensure high-quality
random number generation across the array submissions. See the associated
vignette for a brief tutorial of this setup.
runArraySimulation( design, ..., replications, iseed, filename, dirname = NULL, arrayID = getArrayID(), array2row = function(arrayID) arrayID, addArrayInfo = TRUE, parallel = FALSE, cl = NULL, ncores = parallelly::availableCores(omit = 1L), save_details = list(), control = list() )
runArraySimulation( design, ..., replications, iseed, filename, dirname = NULL, arrayID = getArrayID(), array2row = function(arrayID) arrayID, addArrayInfo = TRUE, parallel = FALSE, cl = NULL, ncores = parallelly::availableCores(omit = 1L), save_details = list(), control = list() )
design |
design object containing simulation conditions on a per row basis.
This function is design to submit each row as in independent job on a HPC cluster.
See |
... |
additional arguments to be passed to |
replications |
number of independent replications to perform per
condition (i.e., each row in |
iseed |
initial seed to be passed to |
filename |
file name to save simulation files to (does not need to
specify extension). However, the array ID will be appended to each
|
dirname |
directory to save the files associated with |
arrayID |
array identifier from the scheduler. Must be a number between
1 and |
array2row |
user defined function with the single argument For example, if each |
addArrayInfo |
logical; should the array ID and original design row number
be added to the |
parallel |
logical; use parallel computations via the a "SOCK" cluster?
Only useful when the instruction shell file requires more than 1 core
(number of cores detected via |
cl |
cluster definition. If omitted a "SOCK" cluster will be defined |
ncores |
number of cores to use when |
save_details |
optional list of extra file saving details.
See |
control |
control list passed to
Similarly, |
Due to the nature of how the replication are split it is important that
the L'Ecuyer-CMRG (2002) method of random seeds is used across all
array ID submissions (cf. runSimulation
's parallel
approach, which uses this method to distribute random seeds within
each isolated condition rather than between all conditions). As such, this
function requires the seeds to be generated using
genSeeds
with the iseed
and arrayID
inputs to ensure that each job is analyzing a high-quality
set of random numbers via L'Ecuyer-CMRG's (2002) method, incremented using
nextRNGStream
.
Additionally, for timed simulations on HPC clusters it is also recommended to pass a
control = list(max_time)
value to avoid discarding
conditions that require more than the specified time in the shell script.
The max_time
value should be less than the maximum time allocated
on the HPC cluster (e.g., approximately 90
depends on how long each replication takes). Simulations with missing
replication information should submit a new set of jobs at a later time
to collect the missing replication information.
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
runSimulation
, expandDesign
,
genSeeds
, SimCheck
,
SimCollect
, getArrayID
library(SimDesign) Design <- createDesign(N = c(10, 20, 30)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret } Summarise <- function(condition, results, fixed_objects){ colMeans(results) } ## Not run: # define initial seed (do this only once to keep it constant!) # iseed <- genSeeds() iseed <- 554184288 ### On cluster submission, the active array ID is obtained via getArrayID(), ### and therefore should be used in real SLURM submissions arrayID <- getArrayID(type = 'slurm') # However, the following example arrayID is set to # the first row only for testing purposes arrayID <- 1L # run the simulation (results not caught on job submission, only files saved) res <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, iseed=iseed, filename='mysim') # saved as 'mysim-1.rds' res SimResults(res) # condition and replication count stored # same, but evaluated with multiple cores res <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, parallel=TRUE, ncores=3, iseed=iseed, filename='myparsim') res SimResults(res) # condition and replication count stored dir() SimClean(c('mysim-1.rds', 'myparsim-1.rds')) ######################## # Same submission job as above, however split the replications over multiple # evaluations and combine when complete Design5 <- expandDesign(Design, 5) Design5 # iseed <- genSeeds() iseed <- 554184288 # arrayID <- getArrayID(type = 'slurm') arrayID <- 14L # run the simulation (replications reduced per row, but same in total) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, filename='mylongsim', arrayID=arrayID) res <- readRDS('mylongsim-14.rds') res SimResults(res) # condition and replication count stored SimClean('mylongsim-14.rds') ### # Emulate the arrayID distribution, storing all results in a 'sim/' folder # (if 'sim/' does not exist in runArraySimulation() it will be # created automatically) dir.create('sim/') # Emulate distribution to nrow(Design5) = 15 independent job arrays ## (just used for presentation purposes on local computer) sapply(1:nrow(Design5), \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', # files: "sim/condition-#.rds" control = list(max_time="04:00:00", max_RAM="4GB"))) |> invisible() # If necessary, conditions above will manually terminate before # 4 hours and 4GB of RAM are used, returning any # successfully completed results before the HPC session times # out (provided .slurm script specified more than 4 hours) # list saved files dir('sim/') # check that all files saved (warnings will be raised if missing files) SimCheck('sim/') |> isTRUE() condition14 <- readRDS('sim/condition-14.rds') condition14 SimResults(condition14) # aggregate simulation results into single file final <- SimCollect('sim/') final # clean simulation directory SimClean(dirs='sim/') ############ # same as above, however passing different amounts of information depending # on the array ID array2row <- function(arrayID){ switch(arrayID, "1"=1:8, "2"=9:14, "3"=15) } # arrayID 1 does row 1 though 8, arrayID 2 does 9 to 14 array2row(1) array2row(2) array2row(3) # arrayID 3 does 15 only # emulate remote array distribution with only 3 arrays sapply(1:3, \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', array2row=array2row)) |> invisible() # list saved files dir('sim/') # note that all row conditions are still stored separately, though note that # arrayID is now 2 instead condition14 <- readRDS('sim/condition-14.rds') condition14 SimResults(condition14) # aggregate simulation results into single file final <- SimCollect('sim/') final # clean simulation directory SimClean(dirs='sim/') ## End(Not run)
library(SimDesign) Design <- createDesign(N = c(10, 20, 30)) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret } Summarise <- function(condition, results, fixed_objects){ colMeans(results) } ## Not run: # define initial seed (do this only once to keep it constant!) # iseed <- genSeeds() iseed <- 554184288 ### On cluster submission, the active array ID is obtained via getArrayID(), ### and therefore should be used in real SLURM submissions arrayID <- getArrayID(type = 'slurm') # However, the following example arrayID is set to # the first row only for testing purposes arrayID <- 1L # run the simulation (results not caught on job submission, only files saved) res <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, iseed=iseed, filename='mysim') # saved as 'mysim-1.rds' res SimResults(res) # condition and replication count stored # same, but evaluated with multiple cores res <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, parallel=TRUE, ncores=3, iseed=iseed, filename='myparsim') res SimResults(res) # condition and replication count stored dir() SimClean(c('mysim-1.rds', 'myparsim-1.rds')) ######################## # Same submission job as above, however split the replications over multiple # evaluations and combine when complete Design5 <- expandDesign(Design, 5) Design5 # iseed <- genSeeds() iseed <- 554184288 # arrayID <- getArrayID(type = 'slurm') arrayID <- 14L # run the simulation (replications reduced per row, but same in total) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, filename='mylongsim', arrayID=arrayID) res <- readRDS('mylongsim-14.rds') res SimResults(res) # condition and replication count stored SimClean('mylongsim-14.rds') ### # Emulate the arrayID distribution, storing all results in a 'sim/' folder # (if 'sim/' does not exist in runArraySimulation() it will be # created automatically) dir.create('sim/') # Emulate distribution to nrow(Design5) = 15 independent job arrays ## (just used for presentation purposes on local computer) sapply(1:nrow(Design5), \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', # files: "sim/condition-#.rds" control = list(max_time="04:00:00", max_RAM="4GB"))) |> invisible() # If necessary, conditions above will manually terminate before # 4 hours and 4GB of RAM are used, returning any # successfully completed results before the HPC session times # out (provided .slurm script specified more than 4 hours) # list saved files dir('sim/') # check that all files saved (warnings will be raised if missing files) SimCheck('sim/') |> isTRUE() condition14 <- readRDS('sim/condition-14.rds') condition14 SimResults(condition14) # aggregate simulation results into single file final <- SimCollect('sim/') final # clean simulation directory SimClean(dirs='sim/') ############ # same as above, however passing different amounts of information depending # on the array ID array2row <- function(arrayID){ switch(arrayID, "1"=1:8, "2"=9:14, "3"=15) } # arrayID 1 does row 1 though 8, arrayID 2 does 9 to 14 array2row(1) array2row(2) array2row(3) # arrayID 3 does 15 only # emulate remote array distribution with only 3 arrays sapply(1:3, \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', array2row=array2row)) |> invisible() # list saved files dir('sim/') # note that all row conditions are still stored separately, though note that # arrayID is now 2 instead condition14 <- readRDS('sim/condition-14.rds') condition14 SimResults(condition14) # aggregate simulation results into single file final <- SimCollect('sim/') final # clean simulation directory SimClean(dirs='sim/') ## End(Not run)
This function runs a Monte Carlo simulation study given a set of predefined simulation functions,
design conditions, and number of replications. Results can be saved as temporary files in case of
interruptions and may be restored by re-running runSimulation
, provided that the respective temp
file can be found in the working directory. runSimulation
supports parallel
and cluster computing (with the parallel
and
future
packages; see also
runArraySimulation
for submitting array jobs to HPC clusters),
global and local debugging, error handling (including fail-safe
stopping when functions fail too often, even across nodes), provides bootstrap estimates of the
sampling variability (optional), and automatic tracking of error and warning messages
with their associated .Random.seed
states.
For convenience, all functions available in the R work-space are exported across all nodes
so that they are more easily accessible (however, other R objects are not, and therefore
must be passed to the fixed_objects
input to become available across nodes).
runSimulation( design, replications, generate, analyse, summarise, fixed_objects = NULL, packages = NULL, filename = NULL, debug = "none", load_seed = NULL, save = any(replications > 2), store_results = TRUE, save_results = FALSE, parallel = FALSE, ncores = parallelly::availableCores(omit = 1L), cl = NULL, notification = "none", beep = FALSE, sound = 1, CI = 0.95, seed = NULL, boot_method = "none", boot_draws = 1000L, max_errors = 50L, resume = TRUE, save_details = list(), control = list(), progress = TRUE, verbose = TRUE ) ## S3 method for class 'SimDesign' summary(object, ...) ## S3 method for class 'SimDesign' print(x, list2char = TRUE, ...)
runSimulation( design, replications, generate, analyse, summarise, fixed_objects = NULL, packages = NULL, filename = NULL, debug = "none", load_seed = NULL, save = any(replications > 2), store_results = TRUE, save_results = FALSE, parallel = FALSE, ncores = parallelly::availableCores(omit = 1L), cl = NULL, notification = "none", beep = FALSE, sound = 1, CI = 0.95, seed = NULL, boot_method = "none", boot_draws = 1000L, max_errors = 50L, resume = TRUE, save_details = list(), control = list(), progress = TRUE, verbose = TRUE ) ## S3 method for class 'SimDesign' summary(object, ...) ## S3 method for class 'SimDesign' print(x, list2char = TRUE, ...)
design |
a |
replications |
number of independent replications to perform per
condition (i.e., each row in |
generate |
user-defined data and parameter generating function (or named list of functions).
See |
analyse |
user-defined analysis function (or named list of functions)
that acts on the data generated from
|
summarise |
optional (but strongly recommended) user-defined summary function
from Note that unlike the Generate and Analyse
steps, the Summarise portion is not as important to perfectly organize
as the results can be summarised later on by using the built-in
Omitting this function will return a tibble with the |
fixed_objects |
(optional) an object (usually a named |
packages |
a character vector of external packages to be used during the simulation (e.g.,
|
filename |
(optional) the name of the Note that if the same file name already exists in the working
directly at the time of saving then a new
file will be generated instead and a warning will be thrown. This helps to
avoid accidentally overwriting
existing files. Default is |
debug |
a string indicating where to initiate a If the For debugging specific rows in the Finally, users may place |
load_seed |
used to replicate an exact simulation state, which is
primarily useful for debugging purposes.
Input can be a character object indicating which file to load from when the
|
save |
logical; save the temporary simulation state to the hard-drive? This is useful
for simulations which require an extended amount of time, though for shorter simulations
can be disabled to slightly improve computational efficiency. When To recover your simulation at the last known location (having patched the issues in the
previous execution code) simply re-run the code you used to
initially define the simulation and the external file will automatically be detected and read-in.
Default is |
store_results |
logical; store the complete tables of simulation results
in the returned object? This is To extract these results
pass the returned object to either |
save_results |
logical; save the results returned from WARNING: saving results to your hard-drive can fill up space very quickly for
larger simulations. Be sure to
test this option using a smaller number of replications before the full Monte
Carlo simulation is performed.
See also |
parallel |
logical; use parallel processing from the Alternatively, if the |
ncores |
number of cores to be used in parallel execution (ignored if using the
|
cl |
cluster object defined by If the |
notification |
an optional character vector input that can be used to send
Pushbullet notifications from a configured
computer. This reports information such as the total execution time, the condition
completed, and error/warning
messages recorded. This arguments assumes that users have already A) registered for
a Pushbullet account,
B) installed the application on their mobile device and computer, and C) created an
associated JSON file of the form
To utilize the |
beep |
logical; call the |
sound |
|
CI |
bootstrap confidence interval level (default is 95%) |
seed |
a vector or list of integers to be used for reproducibility.
The length of the vector must be equal the number of rows in |
boot_method |
method for performing non-parametric bootstrap confidence intervals
for the respective meta-statistics computed by the |
boot_draws |
number of non-parametric bootstrap draws to sample for the |
max_errors |
the simulation will terminate when more than this number of consecutive
errors are thrown in any
given condition, causing the simulation to continue to the next unique |
resume |
logical; if a temporary |
save_details |
a list pertaining to information regarding how and where files should be saved
when the
|
control |
a list for extra information flags for controlling less commonly used features. These include
|
progress |
logical; display a progress bar (using the |
verbose |
logical; print messages to the R console? Default is |
object |
SimDesign object returned from |
... |
additional arguments |
x |
SimDesign object returned from |
list2char |
logical; for |
For an in-depth tutorial of the package please refer to Chalmers and Adkins (2020;
doi:10.20982/tqmp.16.4.p248).
For an earlier didactic presentation of the package refer to Sigal and Chalmers
(2016; doi:10.1080/10691898.2016.1246953). Finally, see the associated
wiki on Github (https://github.com/philchalmers/SimDesign/wiki)
for tutorial material, examples, and applications of SimDesign
to real-world
simulation experiments, as well as the various vignette files associated with the package.
The strategy for organizing the Monte Carlo simulation work-flow is to
Define a suitable Design
object (a tibble
or data.frame
)
containing fixed conditional
information about the Monte Carlo simulations. Each row or this design
object pertains
to a unique set of simulation to study, while each column the simulation factor under
investigation (e.g., sample size,
distribution types, etc). This is often expedited by using the
createDesign
function, and if necessary the argument subset
can be used to remove redundant or non-applicable rows
Define the three step functions to generate the data (Generate
; see also
https://CRAN.R-project.org/view=Distributions for a list of distributions in R),
analyse the generated data by computing the respective parameter estimates, detection rates,
etc (Analyse
), and finally summarise the results across the total
number of replications (Summarise
).
Pass the design
object and three defined R functions to runSimulation
,
and declare the number of replications to perform with the replications
input.
This function will return a suitable
tibble
object with the complete simulation results and execution details
Analyze the output from runSimulation
, possibly using ANOVA techniques
(SimAnova
) and generating suitable plots and tables
Expressing the above more succinctly, the functions to be called have the following form, with the exact functional arguments listed:
Design <- createDesign(...)
Generate <- function(condition, fixed_objects) {...}
Analyse <- function(condition, dat, fixed_objects) {...}
Summarise <- function(condition, results, fixed_objects) {...}
res <- runSimulation(design=Design, replications, generate=Generate,
analyse=Analyse, summarise=Summarise)
The condition
object above represents a single row from the design
object, indicating
a unique Monte Carlo simulation condition. The condition
object also contains two
additional elements to help track the simulation's state: an ID
variable, indicating
the respective row number in the design
object, and a REPLICATION
element
indicating the replication iteration number (an integer value between 1 and replication
).
This setup allows users to easily locate the r
th replication (e.g., REPLICATION == 500
)
within the j
th row in the simulation design (e.g., ID == 2
). The
REPLICATION
input is also useful when temporarily saving files to the hard-drive
when calling external command line utilities (see examples on the wiki).
For a template-based version of the work-flow, which is often useful when initially
defining a simulation, use the SimFunctions
function. This
function will write a template simulation
to one/two files so that modifying the required functions and objects can begin immediately.
This means that users can focus on their Monte Carlo simulation details right away rather
than worrying about the repetitive administrative code-work required to organize the simulation's
execution flow.
Finally, examples, presentation files, and tutorials can be found on the package wiki located at https://github.com/philchalmers/SimDesign/wiki.
a tibble
from the dplyr
package (also of class 'SimDesign'
)
with the original design
conditions in the left-most columns,
simulation results in the middle columns, and additional information in the right-most columns (see below).
The right-most column information for each condition are:
REPLICATIONS
to indicate the number of Monte Carlo replications,
SIM_TIME
to indicate how long (in seconds) it took to complete
all the Monte Carlo replications for each respective design condition,
RAM_USED
amount of RAM that was in use at the time of completing
each simulation condition,
COMPLETED
to indicate the date in which the given simulation condition completed,
SEED
for the integer values in the seed
argument (if applicable; not
relevant when L'Ecuyer-CMRG method used), and, if applicable,
ERRORS
and WARNINGS
which contain counts for the number of error or warning
messages that were caught (if no errors/warnings were observed these columns will be omitted).
Note that to extract the specific error and warnings messages see
SimExtract
. Finally,
if boot_method
was a valid input other than 'none' then the final right-most
columns will contain the labels
BOOT_
followed by the name of the associated meta-statistic defined in summarise()
and
and bootstrapped confidence interval location for the meta-statistics.
To conserve RAM, temporary objects (such as data generated across conditions and replications)
are discarded; however, these can be saved to the hard-disk by passing the appropriate flags.
For longer simulations it is recommended to use the save_results
flag to write the
analysis results to the hard-drive.
The use of the store_seeds
or the save_seeds
options
can be evoked to save R's .Random.seed
state to allow for complete reproducibility of each replication within each condition. These
individual .Random.seed
terms can then be read in with the
load_seed
input to reproduce the exact simulation state at any given replication.
Most often though, saving the complete list of seeds is unnecessary as problematic seeds are
automatically stored in the final simulation object to allow for easier replicability
of potentially problematic errors (which incidentally can be extracted
using SimExtract(res, 'error_seeds')
and passed to the load_seed
argument). Finally,
providing a vector of seeds
is also possible to ensure
that each simulation condition is macro reproducible under the single/multi-core method selected.
Finally, when the Monte Carlo simulation is complete
it is recommended to write the results to a hard-drive for safe keeping, particularly with the
filename
argument provided (for reasons that are more obvious in the parallel computation
descriptions below). Using the filename
argument supplied is safer than using, for instance,
saveRDS
directly because files will never accidentally be overwritten,
and instead a new file name will be created when a conflict arises; this type of implementation safety
is prevalent in many locations in the package to help avoid unrecoverable (yet surprisingly
common) mistakes during the process of designing and executing Monte Carlo simulations.
In the event of a computer crash, power outage, etc, if save = TRUE
was used (the default)
then the original code used to execute runSimulation()
need only be re-run to resume
the simulation. The saved temp file will be read into the function automatically, and the
simulation will continue one the condition where it left off before the simulation
state was terminated. If users wish to remove this temporary
simulation state entirely so as to start anew then simply pass SimClean(temp = TRUE)
in the R console to remove any previously saved temporary objects.
When running simulations in parallel (either with parallel = TRUE
or when using the future
approach with a plan()
other than sequential)
R objects defined in the global environment will generally not be visible across nodes.
Hence, you may see errors such as Error: object 'something' not found
if you try to use
an object that is defined in the work space but is not passed to runSimulation
.
To avoid this type or error, simply pass additional objects to the
fixed_objects
input (usually it's convenient to supply a named list of these objects).
Fortunately, however, custom functions defined in the global environment are exported across
nodes automatically. This makes it convenient when writing code because custom functions will
always be available across nodes if they are visible in the R work space. As well, note the
packages
input to declare packages which must be loaded via library()
in order to make
specific non-standard R functions available across nodes.
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
SimFunctions
, createDesign
,
Generate
, Analyse
, Summarise
,
SimExtract
,
reSummarise
, SimClean
, SimAnova
, SimResults
,
SimCollect
, Attach
, AnalyseIf
,
SimShiny
, manageWarnings
, runArraySimulation
#------------------------------------------------------------------------------- # Example 1: Sampling distribution of mean # This example demonstrate some of the simpler uses of SimDesign, # particularly for classroom settings. The only factor varied in this simulation # is sample size. # skeleton functions to be saved and edited SimFunctions() #### Step 1 --- Define your conditions under study and create design data.frame Design <- createDesign(N = c(10, 20, 30)) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions # help(Generate) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } # help(Analyse) Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat)) # mean of the sample data vector ret } # help(Summarise) Summarise <- function(condition, results, fixed_objects) { # mean and SD summary of the sample means ret <- c(mu=mean(results$mean), SE=sd(results$mean)) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Collect results by looping over the rows in design # run the simulation in testing mode (replications = 2) Final <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise) Final SimResults(Final) # reproduce exact simulation Final_rep <- runSimulation(design=Design, replications=2, seed=Final$SEED, generate=Generate, analyse=Analyse, summarise=Summarise) Final_rep SimResults(Final_rep) ## Not run: # run with more standard number of replications Final <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final SimResults(Final) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Extras # compare SEs estimates to the true SEs from the formula sigma/sqrt(N) 5 / sqrt(Design$N) # To store the results from the analyse function either # a) omit a definition of summarise() to return all results, # b) use store_results = TRUE (default) to store results internally and later # extract with SimResults(), or # c) pass save_results = TRUE to runSimulation() and read the results in with SimResults() # # Note that method c) should be adopted for larger simulations, particularly # if RAM storage could be an issue and error/warning message information is important. # a) approach res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse) res # b) approach (store_results = TRUE by default) res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise) res SimResults(res) # c) approach Final <- runSimulation(design=Design, replications=100, save_results=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) # read-in all conditions (can be memory heavy) res <- SimResults(Final) res head(res[[1]]$results) # just first condition res <- SimResults(Final, which=1) head(res$results) dplyr::tibble(res$condition, res$results) # obtain empirical bootstrapped CIs during an initial run # the simulation was completed (necessarily requires save_results = TRUE) res <- runSimulation(design=Design, replications=1000, boot_method = 'basic', generate=Generate, analyse=Analyse, summarise=Summarise) res # alternative bootstrapped CIs that uses saved results via reSummarise(). # Default directory save to: dirname <- paste0('SimDesign-results_', unname(Sys.info()['nodename']), "/") res <- reSummarise(summarise=Summarise, dir=dirname, boot_method = 'basic') res # remove the saved results from the hard-drive if you no longer want them SimClean(results = TRUE) ## End(Not run) #------------------------------------------------------------------------------- # Example 2: t-test and Welch test when varying sample size, group sizes, and SDs # skeleton functions to be saved and edited SimFunctions() ## Not run: # in real-world simulations it's often better/easier to save # these functions directly to your hard-drive with SimFunctions('my-simulation') ## End(Not run) #### Step 1 --- Define your conditions under study and create design data.frame Design <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2)) Design #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { N <- condition$sample_size # could use Attach() to make objects available grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat)$p.value independent <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- nc(welch, independent) ret } Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Collect results by looping over the rows in design # first, test to see if it works res <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise) res ## Not run: # complete run with 1000 replications per condition res <- runSimulation(design=Design, replications=1000, parallel=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) res View(res) ## save final results to a file upon completion, and play a beep when done runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, beep=TRUE) ## same as above, but send a notification via Pushbullet upon completion library(RPushbullet) # read-in default JSON file runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, notification = 'complete') ## Submit as RStudio job (requires job package and active RStudio session) job::job({ res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise) }, title='t-test simulation') res # object res returned to console when completed ## Debug the generate function. See ?browser for help on debugging ## Type help to see available commands (e.g., n, c, where, ...), ## ls() to see what has been defined, and type Q to quit the debugger runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='generate') ## Alternatively, place a browser() within the desired function line to ## jump to a specific location Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) browser() ret <- EDR(results[,nms], alpha = .05) ret } ## The following debugs the analyse function for the ## second row of the Design input runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='analyse-2') #################################### ## EXTRA: To run the simulation on a user-define cluster, use the following setup (not run) ## Network linked via ssh (two way ssh key-paired connection must be ## possible between master and slave nodes) ## ## Define IP addresses, including primary IP primary <- '192.168.2.20' IPs <- list( list(host=primary, user='phil', ncore=8), list(host='192.168.2.17', user='phil', ncore=8) ) spec <- lapply(IPs, function(IP) rep(list(list(host=IP$host, user=IP$user)), IP$ncore)) spec <- unlist(spec, recursive=FALSE) cl <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec) res <- runSimulation(design=Design, replications=1000, parallel = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise, cl=cl) ## Using parallel='future' to allow the future framework to be used instead library(future) # future structure to be used internally plan(multisession) # specify different plan (default is sequential) res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise) head(res) # The progressr package is used for progress reporting with futures. To redefine # use progressr::handlers() (see below) library(progressr) with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise)) head(res) # re-define progressr's bar (below requires cli) handlers(handler_pbcol( adjust = 1.0, complete = function(s) cli::bg_red(cli::col_black(s)), incomplete = function(s) cli::bg_cyan(cli::col_black(s)) )) with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise)) # reset future computing plan when complete (good practice) plan(sequential) #################################### ###### Post-analysis: Analyze the results via functions like lm() or SimAnova(), and create ###### tables(dplyr) or plots (ggplot2) to help visualize the results. ###### This is where you get to be a data analyst! library(dplyr) res %>% summarise(mean(welch), mean(independent)) res %>% group_by(standard_deviation_ratio, group_size_ratio) %>% summarise(mean(welch), mean(independent)) # quick ANOVA analysis method with all two-way interactions SimAnova( ~ (sample_size + group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE) # or more specific ANOVAs SimAnova(independent ~ (group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE) # make some plots library(ggplot2) library(tidyr) dd <- res %>% select(group_size_ratio, standard_deviation_ratio, welch, independent) %>% pivot_longer(cols=c('welch', 'independent'), names_to = 'stats') dd ggplot(dd, aes(factor(group_size_ratio), value)) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_wrap(~stats) ggplot(dd, aes(factor(group_size_ratio), value, fill = factor(standard_deviation_ratio))) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_grid(stats~standard_deviation_ratio) + theme(legend.position = 'none') ## End(Not run)
#------------------------------------------------------------------------------- # Example 1: Sampling distribution of mean # This example demonstrate some of the simpler uses of SimDesign, # particularly for classroom settings. The only factor varied in this simulation # is sample size. # skeleton functions to be saved and edited SimFunctions() #### Step 1 --- Define your conditions under study and create design data.frame Design <- createDesign(N = c(10, 20, 30)) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions # help(Generate) Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat } # help(Analyse) Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat)) # mean of the sample data vector ret } # help(Summarise) Summarise <- function(condition, results, fixed_objects) { # mean and SD summary of the sample means ret <- c(mu=mean(results$mean), SE=sd(results$mean)) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Collect results by looping over the rows in design # run the simulation in testing mode (replications = 2) Final <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise) Final SimResults(Final) # reproduce exact simulation Final_rep <- runSimulation(design=Design, replications=2, seed=Final$SEED, generate=Generate, analyse=Analyse, summarise=Summarise) Final_rep SimResults(Final_rep) ## Not run: # run with more standard number of replications Final <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise) Final SimResults(Final) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Extras # compare SEs estimates to the true SEs from the formula sigma/sqrt(N) 5 / sqrt(Design$N) # To store the results from the analyse function either # a) omit a definition of summarise() to return all results, # b) use store_results = TRUE (default) to store results internally and later # extract with SimResults(), or # c) pass save_results = TRUE to runSimulation() and read the results in with SimResults() # # Note that method c) should be adopted for larger simulations, particularly # if RAM storage could be an issue and error/warning message information is important. # a) approach res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse) res # b) approach (store_results = TRUE by default) res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise) res SimResults(res) # c) approach Final <- runSimulation(design=Design, replications=100, save_results=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) # read-in all conditions (can be memory heavy) res <- SimResults(Final) res head(res[[1]]$results) # just first condition res <- SimResults(Final, which=1) head(res$results) dplyr::tibble(res$condition, res$results) # obtain empirical bootstrapped CIs during an initial run # the simulation was completed (necessarily requires save_results = TRUE) res <- runSimulation(design=Design, replications=1000, boot_method = 'basic', generate=Generate, analyse=Analyse, summarise=Summarise) res # alternative bootstrapped CIs that uses saved results via reSummarise(). # Default directory save to: dirname <- paste0('SimDesign-results_', unname(Sys.info()['nodename']), "/") res <- reSummarise(summarise=Summarise, dir=dirname, boot_method = 'basic') res # remove the saved results from the hard-drive if you no longer want them SimClean(results = TRUE) ## End(Not run) #------------------------------------------------------------------------------- # Example 2: t-test and Welch test when varying sample size, group sizes, and SDs # skeleton functions to be saved and edited SimFunctions() ## Not run: # in real-world simulations it's often better/easier to save # these functions directly to your hard-drive with SimFunctions('my-simulation') ## End(Not run) #### Step 1 --- Define your conditions under study and create design data.frame Design <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2)) Design #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { N <- condition$sample_size # could use Attach() to make objects available grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat)$p.value independent <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- nc(welch, independent) ret } Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Collect results by looping over the rows in design # first, test to see if it works res <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise) res ## Not run: # complete run with 1000 replications per condition res <- runSimulation(design=Design, replications=1000, parallel=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise) res View(res) ## save final results to a file upon completion, and play a beep when done runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, beep=TRUE) ## same as above, but send a notification via Pushbullet upon completion library(RPushbullet) # read-in default JSON file runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, notification = 'complete') ## Submit as RStudio job (requires job package and active RStudio session) job::job({ res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise) }, title='t-test simulation') res # object res returned to console when completed ## Debug the generate function. See ?browser for help on debugging ## Type help to see available commands (e.g., n, c, where, ...), ## ls() to see what has been defined, and type Q to quit the debugger runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='generate') ## Alternatively, place a browser() within the desired function line to ## jump to a specific location Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) browser() ret <- EDR(results[,nms], alpha = .05) ret } ## The following debugs the analyse function for the ## second row of the Design input runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='analyse-2') #################################### ## EXTRA: To run the simulation on a user-define cluster, use the following setup (not run) ## Network linked via ssh (two way ssh key-paired connection must be ## possible between master and slave nodes) ## ## Define IP addresses, including primary IP primary <- '192.168.2.20' IPs <- list( list(host=primary, user='phil', ncore=8), list(host='192.168.2.17', user='phil', ncore=8) ) spec <- lapply(IPs, function(IP) rep(list(list(host=IP$host, user=IP$user)), IP$ncore)) spec <- unlist(spec, recursive=FALSE) cl <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec) res <- runSimulation(design=Design, replications=1000, parallel = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise, cl=cl) ## Using parallel='future' to allow the future framework to be used instead library(future) # future structure to be used internally plan(multisession) # specify different plan (default is sequential) res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise) head(res) # The progressr package is used for progress reporting with futures. To redefine # use progressr::handlers() (see below) library(progressr) with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise)) head(res) # re-define progressr's bar (below requires cli) handlers(handler_pbcol( adjust = 1.0, complete = function(s) cli::bg_red(cli::col_black(s)), incomplete = function(s) cli::bg_cyan(cli::col_black(s)) )) with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise)) # reset future computing plan when complete (good practice) plan(sequential) #################################### ###### Post-analysis: Analyze the results via functions like lm() or SimAnova(), and create ###### tables(dplyr) or plots (ggplot2) to help visualize the results. ###### This is where you get to be a data analyst! library(dplyr) res %>% summarise(mean(welch), mean(independent)) res %>% group_by(standard_deviation_ratio, group_size_ratio) %>% summarise(mean(welch), mean(independent)) # quick ANOVA analysis method with all two-way interactions SimAnova( ~ (sample_size + group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE) # or more specific ANOVAs SimAnova(independent ~ (group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE) # make some plots library(ggplot2) library(tidyr) dd <- res %>% select(group_size_ratio, standard_deviation_ratio, welch, independent) %>% pivot_longer(cols=c('welch', 'independent'), names_to = 'stats') dd ggplot(dd, aes(factor(group_size_ratio), value)) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_wrap(~stats) ggplot(dd, aes(factor(group_size_ratio), value, fill = factor(standard_deviation_ratio))) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_grid(stats~standard_deviation_ratio) + theme(legend.position = 'none') ## End(Not run)
Generate multivariate non-normal distributions using the third-order polynomial method described by Vale & Maurelli (1983). If only a single variable is generated then this function is equivalent to the method described by Fleishman (1978).
rValeMaurelli( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)) )
rValeMaurelli( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)) )
n |
number of samples to draw |
mean |
a vector of k elements for the mean of the variables |
sigma |
desired k x k covariance matrix between bivariate non-normal variables |
skew |
a vector of k elements for the skewness of the variables |
kurt |
a vector of k elements for the kurtosis of the variables |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521-532.
Vale, C. & Maurelli, V. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48(3), 465-471.
set.seed(1) # univariate with skew nonnormal <- rValeMaurelli(10000, mean=10, sigma=5, skew=1, kurt=3) # psych::describe(nonnormal) # multivariate with skew and kurtosis n <- 10000 r12 <- .4 r13 <- .9 r23 <- .1 cor <- matrix(c(1,r12,r13,r12,1,r23,r13,r23,1),3,3) sk <- c(1.5,1.5,0.5) ku <- c(3.75,3.5,0.5) nonnormal <- rValeMaurelli(n, sigma=cor, skew=sk, kurt=ku) # cor(nonnormal) # psych::describe(nonnormal)
set.seed(1) # univariate with skew nonnormal <- rValeMaurelli(10000, mean=10, sigma=5, skew=1, kurt=3) # psych::describe(nonnormal) # multivariate with skew and kurtosis n <- 10000 r12 <- .4 r13 <- .9 r23 <- .1 cor <- matrix(c(1,r12,r13,r12,1,r23,r13,r23,1),3,3) sk <- c(1.5,1.5,0.5) ku <- c(3.75,3.5,0.5) nonnormal <- rValeMaurelli(n, sigma=cor, skew=sk, kurt=ku) # cor(nonnormal) # psych::describe(nonnormal)
Hypothesis test to determine whether an observed empirical detection rate, coupled with a given robustness interval, statistically differs from the population value. Uses the methods described by Serlin (2000) as well to generate critical values (similar to confidence intervals, but define a fixed window of robustness). Critical values may be computed without performing the simulation experiment (hence, can be obtained a priori).
Serlin2000(p, alpha, delta, R, CI = 0.95)
Serlin2000(p, alpha, delta, R, CI = 0.95)
p |
(optional) a vector containing the empirical detection rate(s) to be tested. Omitting this input will compute only the CV1 and CV2 values, while including this input will perform a one-sided hypothesis test for robustness |
alpha |
Type I error rate (e.g., often set to .05) |
delta |
(optional) symmetric robustness interval around |
R |
number of replications used in the simulation |
CI |
confidence interval for |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Serlin, R. C. (2000). Testing for Robustness in Monte Carlo Studies. Psychological Methods, 5, 230-240.
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
# Cochran's criteria at alpha = .05 (i.e., 0.5 +- .01), assuming N = 2000 Serlin2000(p = .051, alpha = .05, delta = .01, R = 2000) # Bradley's liberal criteria given p = .06 and .076, assuming N = 1000 Serlin2000(p = .060, alpha = .05, delta = .025, R = 1000) Serlin2000(p = .076, alpha = .05, delta = .025, R = 1000) # multiple p-values Serlin2000(p = c(.05, .06, .07), alpha = .05, delta = .025, R = 1000) # CV values computed before simulation performed Serlin2000(alpha = .05, R = 2500)
# Cochran's criteria at alpha = .05 (i.e., 0.5 +- .01), assuming N = 2000 Serlin2000(p = .051, alpha = .05, delta = .01, R = 2000) # Bradley's liberal criteria given p = .06 and .076, assuming N = 1000 Serlin2000(p = .060, alpha = .05, delta = .025, R = 1000) Serlin2000(p = .076, alpha = .05, delta = .025, R = 1000) # multiple p-values Serlin2000(p = c(.05, .06, .07), alpha = .05, delta = .025, R = 1000) # CV values computed before simulation performed Serlin2000(alpha = .05, R = 2500)
Given a simulation that was executed with runSimulation
,
potentially with the argument store_results = TRUE
to store the
unsummarised analysis results, fit a surrogate function approximation (SFA)
model to the results and (optionally) perform a root-solving
step to solve a target quantity. See Schoemann et al. (2014) for details.
SFA( results, formula, family = "binomial", b = NULL, design = NULL, CI = 0.95, interval = NULL, ... ) ## S3 method for class 'SFA' print(x, ...)
SFA( results, formula, family = "binomial", b = NULL, design = NULL, CI = 0.95, interval = NULL, ... ) ## S3 method for class 'SFA' print(x, ...)
results |
data returned from |
formula |
formula to specify for the regression model |
family |
character vector indicating the family of GLMs to use
(see |
b |
(optional) Target quantity to use for root solving given the fitted surrogate function (e.g., find sample size associated with SFA implied power of .80) |
design |
(optional) |
CI |
advertised confidence interval of SFA prediction around solved target |
interval |
interval to be passed to |
... |
additional arguments to pass to |
x |
an object of class |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Schoemann, A. M., Miller, P., Pornprasertmanit, S., and Wu, W. (2014). Using Monte Carlo simulations to determine power and sample size for planned missing designs. International Journal of Behavioral Development, SAGE Publications, 38, 471-479.
## Not run: # create long Design object to fit surrogate over Design <- createDesign(N = 100:500, d = .2) Design #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { p <- c(p = t.test(DV ~ group, dat, var.equal=TRUE)$p.value) p } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha = .05) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Estimate power over N # Use small number of replications given range of sample sizes ## note that due to the lower replications disabling the ## RAM printing will help reduce overhead sim <- runSimulation(design=Design, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim # total of 4010 replication sum(sim$REPLICATIONS) # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim) sim_results <- within(sim_results, sig <- p < .05) sim_results # fitted model sfa <- SFA(sim_results, formula = sig ~ N) sfa summary(sfa) # plot the observed and SFA expected values plot(p ~ N, sim, las=1, pch=16, main='Rejection rates with R=10') pred <- predict(sfa, type = 'response') lines(sim_results$N, pred, col='red', lty=2) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) ################ # example with smaller range but higher precision Design <- createDesign(N = 375:425, d = .2) Design sim2 <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim2 sum(sim2$REPLICATIONS) # more replications in total # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim2) sim_results <- within(sim_results, sig <- p < .05) sim_results # fitted model sfa <- SFA(sim_results, formula = sig ~ N) sfa summary(sfa) # plot the observed and SFA expected values plot(p ~ N, sim2, las=1, pch=16, main='Rejection rates with R=100') pred <- predict(sfa, type = 'response') lines(sim_results$N, pred, col='red', lty=2) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) ################### # vary multiple parameters (e.g., sample size + effect size) to fit # multi-parameter surrogate Design <- createDesign(N = seq(from=10, to=500, by=10), d = seq(from=.1, to=.5, by=.1)) Design sim3 <- runSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim3 sum(sim3$REPLICATIONS) # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim3) sim_results <- within(sim_results, sig <- p < .05) sim_results # additive effects (logit(sig) ~ N + d) sfa0 <- SFA(sim_results, formula = sig ~ N+d) sfa0 # multiplicative effects (logit(sig) ~ N + d + N:d) sfa <- SFA(sim_results, formula = sig ~ N*d) sfa # multiplicative better fit (sample size interacts with effect size) anova(sfa0, sfa, test = "LRT") summary(sfa) # plot the observed and SFA expected values library(ggplot2) sim3$pred <- predict(sfa, type = 'response', newdata=sim3) ggplot(sim3, aes(N, p, color = factor(d))) + geom_point() + geom_line(aes(y=pred)) + facet_wrap(~factor(d)) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) # root prediction where d *not* used in original data design <- data.frame(N=NA, d=.25) sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.25) ## End(Not run)
## Not run: # create long Design object to fit surrogate over Design <- createDesign(N = 100:500, d = .2) Design #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { p <- c(p = t.test(DV ~ group, dat, var.equal=TRUE)$p.value) p } Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha = .05) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Estimate power over N # Use small number of replications given range of sample sizes ## note that due to the lower replications disabling the ## RAM printing will help reduce overhead sim <- runSimulation(design=Design, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim # total of 4010 replication sum(sim$REPLICATIONS) # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim) sim_results <- within(sim_results, sig <- p < .05) sim_results # fitted model sfa <- SFA(sim_results, formula = sig ~ N) sfa summary(sfa) # plot the observed and SFA expected values plot(p ~ N, sim, las=1, pch=16, main='Rejection rates with R=10') pred <- predict(sfa, type = 'response') lines(sim_results$N, pred, col='red', lty=2) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) ################ # example with smaller range but higher precision Design <- createDesign(N = 375:425, d = .2) Design sim2 <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim2 sum(sim2$REPLICATIONS) # more replications in total # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim2) sim_results <- within(sim_results, sig <- p < .05) sim_results # fitted model sfa <- SFA(sim_results, formula = sig ~ N) sfa summary(sfa) # plot the observed and SFA expected values plot(p ~ N, sim2, las=1, pch=16, main='Rejection rates with R=100') pred <- predict(sfa, type = 'response') lines(sim_results$N, pred, col='red', lty=2) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) ################### # vary multiple parameters (e.g., sample size + effect size) to fit # multi-parameter surrogate Design <- createDesign(N = seq(from=10, to=500, by=10), d = seq(from=.1, to=.5, by=.1)) Design sim3 <- runSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE)) sim3 sum(sim3$REPLICATIONS) # use the unsummarised results for the SFA, and include p.values < alpha sim_results <- SimResults(sim3) sim_results <- within(sim_results, sig <- p < .05) sim_results # additive effects (logit(sig) ~ N + d) sfa0 <- SFA(sim_results, formula = sig ~ N+d) sfa0 # multiplicative effects (logit(sig) ~ N + d + N:d) sfa <- SFA(sim_results, formula = sig ~ N*d) sfa # multiplicative better fit (sample size interacts with effect size) anova(sfa0, sfa, test = "LRT") summary(sfa) # plot the observed and SFA expected values library(ggplot2) sim3$pred <- predict(sfa, type = 'response', newdata=sim3) ggplot(sim3, aes(N, p, color = factor(d))) + geom_point() + geom_line(aes(y=pred)) + facet_wrap(~factor(d)) # fitted model + root-solved solution given f(.) = b, # where b = target power of .8 design <- data.frame(N=NA, d=.2) sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.2) # root prediction where d *not* used in original data design <- data.frame(N=NA, d=.25) sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500)) sfa.root # true root pwr::pwr.t.test(power=.8, d=.25) ## End(Not run)
Given the results from a simulation with runSimulation
form an ANOVA table (without
p-values) with effect sizes based on the eta-squared statistic. These results provide approximate
indications of observable simulation effects, therefore these ANOVA-based results are generally useful
as exploratory rather than inferential tools.
SimAnova(formula, dat, subset = NULL, rates = TRUE)
SimAnova(formula, dat, subset = NULL, rates = TRUE)
formula |
an R formula generally of a form suitable for |
dat |
an object returned from |
subset |
an optional argument to be passed to |
rates |
logical; does the dependent variable consist of rates (e.g., returned from
|
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
data(BF_sim) # all results (not usually good to mix Power and Type I results together) SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim) # only use anova for Type I error conditions SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim, subset = var_ratio == 1) # run all DVs at once using the same formula SimAnova(~ groups_equal * distribution, BF_sim, subset = var_ratio == 1)
data(BF_sim) # all results (not usually good to mix Power and Type I results together) SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim) # only use anova for Type I error conditions SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim, subset = var_ratio == 1) # run all DVs at once using the same formula SimAnova(~ groups_equal * distribution, BF_sim, subset = var_ratio == 1)
Given the saved files from a runArraySimulation
remote
evaluation check whether all .rds
files have been saved. If missing
the missing row condition numbers will be returned.
SimCheck(dir = NULL, files = NULL, min = 1L, max = NULL)
SimCheck(dir = NULL, files = NULL, min = 1L, max = NULL)
dir |
character vector input indicating the directory
containing the |
files |
vector of file names referring to the saved simulation files.
E.g. |
min |
minimum number after the |
max |
maximum number after the |
returns an invisible TRUE if all the files are present and FALSE otherwise
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
runArraySimulation
, SimCollect
## Not run: # if files are in mysimfiles/ directory SimCheck('mysimfiles') # specifying files explicility setwd('mysimfiles/') SimCheck(files=dir()) ## End(Not run)
## Not run: # if files are in mysimfiles/ directory SimCheck('mysimfiles') # specifying files explicility setwd('mysimfiles/') SimCheck(files=dir()) ## End(Not run)
This function is mainly used in pilot studies where results and datasets have been temporarily saved
by runSimulation
but should be removed before beginning the full
Monte Carlo simulation (e.g., remove files and folders which contained bugs/biased results).
SimClean( ..., dirs = NULL, temp = TRUE, results = FALSE, seeds = FALSE, save_details = list() )
SimClean( ..., dirs = NULL, temp = TRUE, results = FALSE, seeds = FALSE, save_details = list() )
... |
one or more character objects indicating which files to remove. Used to remove
|
dirs |
a character vector indicating which directories to remove |
temp |
logical; remove the temporary file saved when passing |
results |
logical; remove the |
seeds |
logical; remove the seed files
saved when passing |
save_details |
a list pertaining to information about how and where files were saved
(see the corresponding list in |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: # remove file called 'results.rds' SimClean('results.rds') # remove default temp file SimClean() # remove customized saved-results directory called 'mydir' SimClean(results = TRUE, save_details = list(save_results_dirname = 'mydir')) ## End(Not run)
## Not run: # remove file called 'results.rds' SimClean('results.rds') # remove default temp file SimClean() # remove customized saved-results directory called 'mydir' SimClean(results = TRUE, save_details = list(save_results_dirname = 'mydir')) ## End(Not run)
This function collects and aggregates the results from
SimDesign
's runSimulation
into a single
objects suitable for post-analyses, or combines all the saved results directories and combines
them into one. This is useful when results are run piece-wise on one node (e.g., 500 replications
in one batch, 500 again at a later date, though be careful about the set.seed
use as the random numbers will tend to correlate the more it is used) or run independently across different
nodes/computing cores (e.g., see runArraySimulation
.
SimCollect( dir = NULL, files = NULL, filename = NULL, select = NULL, check.only = FALSE, target.reps = NULL, warning_details = FALSE, error_details = TRUE ) aggregate_simulations(...)
SimCollect( dir = NULL, files = NULL, filename = NULL, select = NULL, check.only = FALSE, target.reps = NULL, warning_details = FALSE, error_details = TRUE ) aggregate_simulations(...)
dir |
a |
files |
a |
filename |
(optional) name of .rds file to save aggregate simulation file to. If not specified then the results will only be returned in the R console. |
select |
a character vector indicating columns to variables to select from the
|
check.only |
logical; for larger simulations file sets, such as those generated by
|
target.reps |
(optional) number of replications to check against to evaluate whether the simulation files returned the desired number of replications. If missing, the highest detected value from the collected set of replication information will be used |
warning_details |
logical; include the aggregate of the warnings to be extracted via
|
error_details |
logical; include the aggregate of the errors to be extracted via
|
... |
not used |
returns a data.frame/tibble
with the (weighted) average/aggregate
of the simulation results
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
runSimulation
, runArraySimulation
,
SimCheck
## Not run: setwd('my_working_directory') ## run simulations to save the .rds files (or move them to the working directory) # seeds1 <- genSeeds(design) # seeds2 <- genSeeds(design, old.seeds=seeds1) # ret1 <- runSimulation(design, ..., seed=seeds1, filename='file1') # ret2 <- runSimulation(design, ..., seed=seeds2, filename='file2') # saves to the hard-drive and stores in workspace final <- SimCollect(files = c('file1.rds', 'file2.rds')) final # If filename not included, can be extracted from results # files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename')) # final <- SimCollect(files = files) ################################################# # Example where each row condition is repeated, evaluated independently, # and later collapsed into a single analysis object # Each condition repeated four times (hence, replications # should be set to desired.reps/4) Design <- createDesign(mu = c(0,5), N = c(30, 60)) Design # assume the N=60 takes longer, and should be spread out across more arrays Design_long <- expandDesign(Design, c(2,2,4,4)) Design_long replications <- c(rep(50, 4), rep(25,8)) data.frame(Design_long, replications) #------------------------------------------------------------------- Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, mean=mu)) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), SD=sd(dat)) ret } Summarise <- function(condition, results, fixed_objects) { ret <- colMeans(results) ret } #------------------------------------------------------------------- # create directory to store all final simulation files dir.create('sim_files/') iseed <- genSeeds() # distribute jobs independently sapply(1:nrow(Design_long), \(i) { runArraySimulation(design=Design_long, replications=replications, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=i, dirname='sim_files/', filename='job', iseed=iseed) }) |> invisible() # check that all replications satisfy target SimCollect('sim_files/', check.only = TRUE) # this would have been returned were the target.rep supposed to be 1000 SimCollect('sim_files/', check.only = TRUE, target.reps=1000) # aggregate into single object sim <- SimCollect('sim_files/') sim SimClean(dir='sim_files/') ## End(Not run)
## Not run: setwd('my_working_directory') ## run simulations to save the .rds files (or move them to the working directory) # seeds1 <- genSeeds(design) # seeds2 <- genSeeds(design, old.seeds=seeds1) # ret1 <- runSimulation(design, ..., seed=seeds1, filename='file1') # ret2 <- runSimulation(design, ..., seed=seeds2, filename='file2') # saves to the hard-drive and stores in workspace final <- SimCollect(files = c('file1.rds', 'file2.rds')) final # If filename not included, can be extracted from results # files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename')) # final <- SimCollect(files = files) ################################################# # Example where each row condition is repeated, evaluated independently, # and later collapsed into a single analysis object # Each condition repeated four times (hence, replications # should be set to desired.reps/4) Design <- createDesign(mu = c(0,5), N = c(30, 60)) Design # assume the N=60 takes longer, and should be spread out across more arrays Design_long <- expandDesign(Design, c(2,2,4,4)) Design_long replications <- c(rep(50, 4), rep(25,8)) data.frame(Design_long, replications) #------------------------------------------------------------------- Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, mean=mu)) dat } Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), SD=sd(dat)) ret } Summarise <- function(condition, results, fixed_objects) { ret <- colMeans(results) ret } #------------------------------------------------------------------- # create directory to store all final simulation files dir.create('sim_files/') iseed <- genSeeds() # distribute jobs independently sapply(1:nrow(Design_long), \(i) { runArraySimulation(design=Design_long, replications=replications, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=i, dirname='sim_files/', filename='job', iseed=iseed) }) |> invisible() # check that all replications satisfy target SimCollect('sim_files/', check.only = TRUE) # this would have been returned were the target.rep supposed to be 1000 SimCollect('sim_files/', check.only = TRUE, target.reps=1000) # aggregate into single object sim <- SimCollect('sim_files/') sim SimClean(dir='sim_files/') ## End(Not run)
Structure for Organizing Monte Carlo Simulation Designs
Provides tools to help organize Monte Carlo simulations in R. The package
controls the structure and back-end of Monte Carlo simulations
by utilizing a general generate-analyse-summarise strategy. The functions provided control common
simulation issues such as re-simulating non-convergent results, support parallel
back-end computations with proper random number generation within each simulation
condition,
save and restore temporary files,
aggregate results across independent nodes, and provide native support for debugging.
The primary function for organizing the simulations is runSimulation
, while
for array jobs submitting to HPC clusters (e.g., SLURM) see runArraySimulation
and the associated package vignettes.
For an in-depth tutorial of the package please refer to
Chalmers and Adkins (2020; doi:10.20982/tqmp.16.4.p248).
For an earlier didactic presentation of the package users can refer to Sigal and Chalmers
(2016; doi:10.1080/10691898.2016.1246953). Finally, see the associated
wiki on Github (https://github.com/philchalmers/SimDesign/wiki)
for other tutorial material, examples, and applications of SimDesign
to real-world simulations.
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Function used to extract any error or warnings messages, the seeds associated with any error or warning messages, and any analysis results that were stored in the final simulation object.
SimExtract(object, what, fuzzy = TRUE, append = TRUE)
SimExtract(object, what, fuzzy = TRUE, append = TRUE)
object |
object returned from |
what |
character indicating what information to extract. Possible inputs
include Note that |
fuzzy |
logical; use fuzzy string matching to reduce effectively identical messages? For example, when attempting to invert a matrix the error message "System is computationally singular: reciprocal condition number = 1.92747e-17" and "System is computationally singular: reciprocal condition number = 2.15321e-16" are effectively the same, and likely should be reported in the same columns of the extracted output |
append |
logical; append the design conditions when extracting error/warning messages? |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: Generate <- function(condition, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('GENERATE WARNING: int greater than 5') if(int == 1) stop('GENERATE ERROR: integer is 1') rnorm(5) } Analyse <- function(condition, dat, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('ANALYSE WARNING: int greater than 5') if(int == 1) stop('ANALYSE ERROR: int is 1') c(ret = 1) } Summarise <- function(condition, results, fixed_objects) { mean(results) } res <- runSimulation(replications = 100, seed=1234, verbose=FALSE, generate=Generate, analyse=Analyse, summarise=Summarise) res SimExtract(res, what = 'errors') SimExtract(res, what = 'warnings') seeds <- SimExtract(res, what = 'error_seeds') seeds[,1:3] # replicate a specific error for debugging (type Q to exit debugger) res <- runSimulation(replications = 100, load_seed=seeds[,1], debug='analyse', generate=Generate, analyse=Analyse, summarise=Summarise) ## End(Not run)
## Not run: Generate <- function(condition, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('GENERATE WARNING: int greater than 5') if(int == 1) stop('GENERATE ERROR: integer is 1') rnorm(5) } Analyse <- function(condition, dat, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('ANALYSE WARNING: int greater than 5') if(int == 1) stop('ANALYSE ERROR: int is 1') c(ret = 1) } Summarise <- function(condition, results, fixed_objects) { mean(results) } res <- runSimulation(replications = 100, seed=1234, verbose=FALSE, generate=Generate, analyse=Analyse, summarise=Summarise) res SimExtract(res, what = 'errors') SimExtract(res, what = 'warnings') seeds <- SimExtract(res, what = 'error_seeds') seeds[,1:3] # replicate a specific error for debugging (type Q to exit debugger) res <- runSimulation(replications = 100, load_seed=seeds[,1], debug='analyse', generate=Generate, analyse=Analyse, summarise=Summarise) ## End(Not run)
This function prints template versions of the required Design
and Generate-Analyse-Summarise functions
for SimDesign
to run simulations. Templated output comes complete with the correct inputs,
class of outputs, and optional comments to help with the initial definitions.
Use this at the start of your Monte Carlo simulation study. Following
the definition of the SimDesign
template file please refer to detailed the information
in runSimulation
for how to edit this template to make a working simulation study.
SimFunctions( filename = NULL, dir = getwd(), save_structure = "single", extra_file = FALSE, nAnalyses = 1, nGenerate = 1, summarise = TRUE, comments = FALSE, openFiles = TRUE, spin_header = TRUE, SimSolve = FALSE )
SimFunctions( filename = NULL, dir = getwd(), save_structure = "single", extra_file = FALSE, nAnalyses = 1, nGenerate = 1, summarise = TRUE, comments = FALSE, openFiles = TRUE, spin_header = TRUE, SimSolve = FALSE )
filename |
a character vector indicating whether the output should be saved to two respective files containing the simulation design and the functional components, respectively. Using this option is generally the recommended approach when beginning to write a Monte Carlo simulation |
dir |
the directory to write the files to. Default is the working directory |
save_structure |
character indicating the number of files to break the simulation code into
when |
extra_file |
logical; should an extra file be saved containing user-defined functions or objects?
Default is |
nAnalyses |
number of analysis functions to create (default is 1). Increasing the value of this argument when independent analysis are being performed allows function definitions to be better partitioned and potentially more modular |
nGenerate |
number of generate functions to create (default is 1). Increase the value
of this argument when when the data generation functions are very different and should
be isolated from each other (otherwise, if there is much in common between the generate
steps, the default of 1 should be preferred). Otherwise, if |
summarise |
include |
comments |
logical; include helpful comments? Default is |
openFiles |
logical; after files have been generated, open them in your text editor (e.g., if Rstudio is running the scripts will open in a new tab)? |
spin_header |
logical; include a basic |
SimSolve |
logical; should the template be generated that is intended for a
|
The recommended approach to organizing Monte Carlo simulation files is to first save the template generated
by this function to the hard-drive by passing a suitable filename
argument
(which, if users are interacting
with R via the RStudio IDE, will also open the template file after it has been saved).
For larger simulations, two
separate files could also be used (achieved by changing out.files
),
and may be easier for debugging/sourcing the simulation code; however, this is a
matter of preference and does not change any functionality in the package.
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
SimFunctions() SimFunctions(comments = TRUE) #with helpful comments ## Not run: # write output files to a single file with comments SimFunctions('mysim', comments = TRUE) # Multiple analysis functions for optional partitioning SimFunctions(nAnalyses = 2) SimFunctions(nAnalyses = 3) # Multiple analysis + generate functions SimFunctions(nAnalyses = 2, nGenerate=2) # save multiple files for the purpose of designing larger simulations # (also include extra_file for user-defined objects/functions) SimFunctions('myBigSim', save_structure = 'all', nAnalyses = 3, nGenerate=2, extra_file = TRUE) ## End(Not run)
SimFunctions() SimFunctions(comments = TRUE) #with helpful comments ## Not run: # write output files to a single file with comments SimFunctions('mysim', comments = TRUE) # Multiple analysis functions for optional partitioning SimFunctions(nAnalyses = 2) SimFunctions(nAnalyses = 3) # Multiple analysis + generate functions SimFunctions(nAnalyses = 2, nGenerate=2) # save multiple files for the purpose of designing larger simulations # (also include extra_file for user-defined objects/functions) SimFunctions('myBigSim', save_structure = 'all', nAnalyses = 3, nGenerate=2, extra_file = TRUE) ## End(Not run)
If runSimulation
was passed the flag save_results = TRUE
then the
row results corresponding to the design
object will be stored to a suitable
sub-directory as individual .rds
files. While users could use readRDS
directly
to read these files in themselves, this convenience function will read the desired rows in
automatically given the returned object
from the simulation. Can be used to read in 1 or more .rds
files at once (if more than 1 file
is read in then the result will be stored in a list).
SimResults(obj, which, prefix = "results-row", wd = getwd())
SimResults(obj, which, prefix = "results-row", wd = getwd())
obj |
object returned from |
which |
a numeric vector indicating which rows should be read in. If missing, all rows will be read in |
prefix |
character indicating prefix used for stored files |
wd |
working directory; default is found with |
the returned result is either a nested list (when length(which) > 1
) or a single list
(when length(which) == 1
) containing the simulation results. Each read-in result refers to
a list of 4 elements:
condition
the associate row (ID) and conditions from the
respective design
object
results
the object with returned from the analyse
function, potentially
simplified into a matrix or data.frame
errors
a table containing the message and number of errors that caused the generate-analyse steps to be rerun. These should be inspected carefully as they could indicate validity issues with the simulation that should be noted
warnings
a table containing the message and number of non-fatal warnings which arose from the analyse step. These should be inspected carefully as they could indicate validity issues with the simulation that should be noted
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: # store results (default behaviour) sim <- runSimulation(..., store_results = TRUE) SimResults(sim) # store results to drive if RAM issues are present obj <- runSimulation(..., save_results = TRUE) # row 1 results row1 <- SimResults(obj, 1) # rows 1:5, stored in a named list rows_1to5 <- SimResults(obj, 1:5) # all results rows_all <- SimResults(obj) ## End(Not run)
## Not run: # store results (default behaviour) sim <- runSimulation(..., store_results = TRUE) SimResults(sim) # store results to drive if RAM issues are present obj <- runSimulation(..., save_results = TRUE) # row 1 results row1 <- SimResults(obj, 1) # rows 1:5, stored in a named list rows_1to5 <- SimResults(obj, 1:5) # all results rows_all <- SimResults(obj) ## End(Not run)
This function generates suitable stand-alone code from the shiny
package to create simple
web-interfaces for performing single condition Monte Carlo simulations. The template
generated is relatively minimalistic, but allows the user to quickly and easily
edit the saved files to customize the associated shiny elements as they see fit.
SimShiny(filename = NULL, dir = getwd(), design, ...)
SimShiny(filename = NULL, dir = getwd(), design, ...)
filename |
an optional name of a text file to save the server and UI components (e.g., 'mysimGUI.R'). If omitted, the code will be printed to the R console instead |
dir |
the directory to write the files to. Default is the working directory |
design |
|
... |
arguments to be passed to |
Phil Chalmers [email protected]
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
## Not run: Design <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2)) Generate <- function(condition, fixed_objects) { N <- condition$sample_size grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ind <- t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) ret } Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret } # test that it works # Final <- runSimulation(design=Design, replications=5, # generate=Generate, analyse=Analyse, summarise=Summarise) # print code to console SimShiny(design=Design, generate=Generate, analyse=Analyse, summarise=Summarise, verbose=FALSE) # save shiny code to file SimShiny('app.R', design=Design, generate=Generate, analyse=Analyse, summarise=Summarise, verbose=FALSE) # run the application shiny::runApp() shiny::runApp(launch.browser = TRUE) # in web-browser ## End(Not run)
## Not run: Design <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2)) Generate <- function(condition, fixed_objects) { N <- condition$sample_size grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ind <- t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) ret } Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret } # test that it works # Final <- runSimulation(design=Design, replications=5, # generate=Generate, analyse=Analyse, summarise=Summarise) # print code to console SimShiny(design=Design, generate=Generate, analyse=Analyse, summarise=Summarise, verbose=FALSE) # save shiny code to file SimShiny('app.R', design=Design, generate=Generate, analyse=Analyse, summarise=Summarise, verbose=FALSE) # run the application shiny::runApp() shiny::runApp(launch.browser = TRUE) # in web-browser ## End(Not run)
This function provides a stochastic root-finding approach to solving
specific quantities in simulation experiments (e.g., solving for a specific
sample size to meet a target power rate) using the
Probablistic Bisection Algorithm with Bolstering and Interpolations
(ProBABLI; Chalmers, accepted). The structure follows the
steps outlined in runSimulation
, however portions of
the design
input are taken as variables to be estimated rather than
fixed, and the constant b
is required in order to
solve the root equation f(x) - b = 0
. Stochastic root search is terminated
based on the successive behavior of the x
estimates.
For even greater advertised accuracy with ProBABLI, termination criteria
can be based on the width of the advertised predicting interval
(via predCI.tol
) or by specifying how long the investigator
is willing to wait for the final estimates (via wait.time
,
where longer wait times lead to progressively better accuracy in
the final estimates).
SimSolve( design, interval, b, generate, analyse, summarise, replications = list(burnin.iter = 15L, burnin.reps = 100L, max.reps = 500L, min.total.reps = 9000L, increase.by = 10L), integer = TRUE, formula = y ~ poly(x, 2), family = "binomial", parallel = FALSE, cl = NULL, save = TRUE, resume = TRUE, method = "ProBABLI", wait.time = NULL, ncores = parallelly::availableCores(omit = 1L), type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"), maxiter = 100L, check.interval = TRUE, verbose = TRUE, control = list(), predCI = 0.95, predCI.tol = NULL, ... ) ## S3 method for class 'SimSolve' summary(object, tab.only = FALSE, reps.cutoff = 300, ...) ## S3 method for class 'SimSolve' plot(x, y, ...)
SimSolve( design, interval, b, generate, analyse, summarise, replications = list(burnin.iter = 15L, burnin.reps = 100L, max.reps = 500L, min.total.reps = 9000L, increase.by = 10L), integer = TRUE, formula = y ~ poly(x, 2), family = "binomial", parallel = FALSE, cl = NULL, save = TRUE, resume = TRUE, method = "ProBABLI", wait.time = NULL, ncores = parallelly::availableCores(omit = 1L), type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"), maxiter = 100L, check.interval = TRUE, verbose = TRUE, control = list(), predCI = 0.95, predCI.tol = NULL, ... ) ## S3 method for class 'SimSolve' summary(object, tab.only = FALSE, reps.cutoff = 300, ...) ## S3 method for class 'SimSolve' plot(x, y, ...)
design |
a |
interval |
a vector of length two, or matrix with |
b |
a single constant used to solve the root equation |
generate |
generate function. See |
analyse |
analysis function. See |
summarise |
summary function that returns a single number corresponding
to a function evaluation |
replications |
a named list or vector indicating the number of replication to
use for each design condition per PBA iteration. By default the input is a
Vector inputs can specify the exact replications for each iterations. As a general rule, early iterations should be relatively low for initial searches to avoid unnecessary computations for locating the approximate root, though the number of replications should gradually increase to reduce the sampling variability as the PBA approaches the root. |
integer |
logical; should the values of the root be considered integer
or numeric? If |
formula |
regression formula to use when |
family |
Note that if individual results from the |
parallel |
for parallel computing for slower simulation experiments
(see |
cl |
see |
save |
logical; store temporary file in case of crashes. If detected
in the working directory will automatically be loaded to resume (see
|
resume |
logical; if a temporary |
method |
optimizer method to use. Default is the stochastic root-finder
|
wait.time |
(optional) argument passed to |
ncores |
see |
type |
type of cluster object to define. If |
maxiter |
the maximum number of iterations (default 100) |
check.interval |
logical; should an initial check be made to determine
whether |
verbose |
logical; print information to the console? |
control |
a
|
predCI |
advertised confidence interval probability for final
model-based prediction of target |
predCI.tol |
(optional) rather than relying on the changes between successive
estimates (default), if the predicting CI is consistently within this
supplied tolerance input range then terminate.
This provides termination behaviour based on the predicted
precision of the root solutions rather than their stability history, and therefore
can be used to obtain estimates with a particular level of advertised accuracy.
For example, when solving for a sample size value ( |
... |
additional arguments to be pasted to |
object |
object of class |
tab.only |
logical; print only the (reduce) table of estimates? |
reps.cutoff |
integer indicating the rows to omit from output if the number of replications do no reach this value |
x |
object of class |
y |
design row to plot. If omitted defaults to 1 |
Root finding is performed using a progressively bolstered version of the
probabilistic bisection algorithm (PBA
) to find the
associated root given the noisy simulation
objective function evaluations. Information is collected throughout
the search to make more accurate predictions about the
associated root via interpolation. If interpolations fail, then the last
iteration of the PBA search is returned as the best guess.
the filled-in design
object containing the associated lower and upper interval
estimates from the stochastic optimization
Phil Chalmers [email protected]
Chalmers, R. P. (in press). Solving Variables with Monte Carlo Simulation Experiments: A
Stochastic Root-Solving Approach. Psychological Methods
. DOI: 10.1037/met0000689
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
## Not run: ########################## ## A Priori Power Analysis ########################## # GOAL: Find specific sample size in each group for independent t-test # corresponding to a power rate of .8 # # For ease of the setup, assume the groups are the same size, and the mean # difference corresponds to Cohen's d values of .2, .5, and .8 # This example can be solved numerically using the pwr package (see below), # though the following simulation setup is far more general and can be # used for any generate-analyse combination of interest # SimFunctions(SimSolve=TRUE) #### Step 1 --- Define your conditions under study and create design data.frame. #### However, use NA placeholder for sample size as it must be solved, #### and add desired power rate to object Design <- createDesign(N = NA, d = c(.2, .5, .8), sig.level = .05) Design # solve for NA's #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value p } Summarise <- function(condition, results, fixed_objects) { # Must return a single number corresponding to f(x) in the # root equation f(x) = b ret <- c(power = EDR(results, alpha = condition$sig.level)) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Optimize N over the rows in design ### (For debugging) may want to see if simulation code works as intended first ### for some given set of inputs # runSimulation(design=createDesign(N=100, d=.8, sig.level=.05), # replications=10, generate=Generate, analyse=Analyse, # summarise=Summarise) # Initial search between N = [10,500] for each row using the default # integer solver (integer = TRUE). In this example, b = target power solved <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # also can plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(d=.2, power = .8) # sig.level/alpha = .05 by default pwr.t.test(d=.5, power = .8) pwr.t.test(d=.8, power = .8) # use estimated N results to see how close power was N <- solved$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) # with rounding N <- ceiling(solved$N) pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() ### (not required, if accuracy is important then ProBABLI should be run longer) # csolved <- solved # csolved$N <- ceiling(solved$N) # confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm # Similarly, terminate if the prediction interval is consistently predicted # to be between [.795, .805]. Note that maxiter increased as well solved_predCI <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, maxiter=200, predCI.tol=.01) solved_predCI summary(solved_predCI) # note that predCI.b are all within [.795, .805] N <- solved_predCI$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) # Alternatively, and often more realistically, wait.time can be used # to specify how long the user is willing to wait for a final estimate. # Solutions involving more iterations will be more accurate, # and therefore it is recommended to run the ProBABLI root-solver as long # the analyst can tolerate if the most accurate estimates are desired. # Below executes the simulation for 5 minutes for each condition up # to a maximum of 1000 iterations, terminating based on whichever occurs first solved_5min <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, wait.time="5", maxiter=1000) solved_5min summary(solved_5min) # use estimated N results to see how close power was N <- solved_5min$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) #------------------------------------------------ ####################### ## Sensitivity Analysis ####################### # GOAL: solve effect size d given sample size and power inputs (inputs # for root no longer required to be an integer) # Generate-Analyse-Summarise functions identical to above, however # Design input includes NA for d element Design <- createDesign(N = c(100, 50, 25), d = NA, sig.level = .05) Design # solve for NA's #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions (same as above) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Optimize d over the rows in design # search between d = [.1, 2] for each row # In this example, b = target power # note that integer = FALSE to allow smooth updates of d solved <- SimSolve(design=Design, b = .8, interval=c(.1, 2), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(n=100, power = .8) pwr.t.test(n=50, power = .8) pwr.t.test(n=25, power = .8) # use estimated d results to see how close power was pwr.t.test(n=100, d = solved$d[1]) pwr.t.test(n=50, d = solved$d[2]) pwr.t.test(n=25, d = solved$d[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() (not required; if accuracy is important ### PROBABLI should just be run longer) # confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm #------------------------------------------------ ##################### ## Criterion Analysis ##################### # GOAL: solve Type I error rate (alpha) given sample size, effect size, and # power inputs (inputs for root no longer required to be an integer). Only useful # when Type I error is less important than achieving the desired 1-beta (power) Design <- createDesign(N = 50, d = c(.2, .5, .8), sig.level = NA) Design # solve for NA's # all other function definitions same as above # search for alpha within [.0001, .8] solved <- SimSolve(design=Design, b = .8, interval=c(.0001, .8), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(n=50, power = .8, d = .2, sig.level=NULL) pwr.t.test(n=50, power = .8, d = .5, sig.level=NULL) pwr.t.test(n=50, power = .8, d = .8, sig.level=NULL) # use estimated alpha results to see how close power was pwr.t.test(n=50, d = .2, sig.level=solved$sig.level[1]) pwr.t.test(n=50, d = .5, sig.level=solved$sig.level[2]) pwr.t.test(n=50, d = .8, sig.level=solved$sig.level[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() (not required; if accuracy is important ### PROBABLI should just be run longer) # confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm ## End(Not run)
## Not run: ########################## ## A Priori Power Analysis ########################## # GOAL: Find specific sample size in each group for independent t-test # corresponding to a power rate of .8 # # For ease of the setup, assume the groups are the same size, and the mean # difference corresponds to Cohen's d values of .2, .5, and .8 # This example can be solved numerically using the pwr package (see below), # though the following simulation setup is far more general and can be # used for any generate-analyse combination of interest # SimFunctions(SimSolve=TRUE) #### Step 1 --- Define your conditions under study and create design data.frame. #### However, use NA placeholder for sample size as it must be solved, #### and add desired power rate to object Design <- createDesign(N = NA, d = c(.2, .5, .8), sig.level = .05) Design # solve for NA's #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions Generate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat } Analyse <- function(condition, dat, fixed_objects) { p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value p } Summarise <- function(condition, results, fixed_objects) { # Must return a single number corresponding to f(x) in the # root equation f(x) = b ret <- c(power = EDR(results, alpha = condition$sig.level)) ret } #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Optimize N over the rows in design ### (For debugging) may want to see if simulation code works as intended first ### for some given set of inputs # runSimulation(design=createDesign(N=100, d=.8, sig.level=.05), # replications=10, generate=Generate, analyse=Analyse, # summarise=Summarise) # Initial search between N = [10,500] for each row using the default # integer solver (integer = TRUE). In this example, b = target power solved <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # also can plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(d=.2, power = .8) # sig.level/alpha = .05 by default pwr.t.test(d=.5, power = .8) pwr.t.test(d=.8, power = .8) # use estimated N results to see how close power was N <- solved$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) # with rounding N <- ceiling(solved$N) pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() ### (not required, if accuracy is important then ProBABLI should be run longer) # csolved <- solved # csolved$N <- ceiling(solved$N) # confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm # Similarly, terminate if the prediction interval is consistently predicted # to be between [.795, .805]. Note that maxiter increased as well solved_predCI <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, maxiter=200, predCI.tol=.01) solved_predCI summary(solved_predCI) # note that predCI.b are all within [.795, .805] N <- solved_predCI$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) # Alternatively, and often more realistically, wait.time can be used # to specify how long the user is willing to wait for a final estimate. # Solutions involving more iterations will be more accurate, # and therefore it is recommended to run the ProBABLI root-solver as long # the analyst can tolerate if the most accurate estimates are desired. # Below executes the simulation for 5 minutes for each condition up # to a maximum of 1000 iterations, terminating based on whichever occurs first solved_5min <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, wait.time="5", maxiter=1000) solved_5min summary(solved_5min) # use estimated N results to see how close power was N <- solved_5min$N pwr.t.test(d=.2, n=N[1]) pwr.t.test(d=.5, n=N[2]) pwr.t.test(d=.8, n=N[3]) #------------------------------------------------ ####################### ## Sensitivity Analysis ####################### # GOAL: solve effect size d given sample size and power inputs (inputs # for root no longer required to be an integer) # Generate-Analyse-Summarise functions identical to above, however # Design input includes NA for d element Design <- createDesign(N = c(100, 50, 25), d = NA, sig.level = .05) Design # solve for NA's #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 2 --- Define generate, analyse, and summarise functions (same as above) #~~~~~~~~~~~~~~~~~~~~~~~~ #### Step 3 --- Optimize d over the rows in design # search between d = [.1, 2] for each row # In this example, b = target power # note that integer = FALSE to allow smooth updates of d solved <- SimSolve(design=Design, b = .8, interval=c(.1, 2), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(n=100, power = .8) pwr.t.test(n=50, power = .8) pwr.t.test(n=25, power = .8) # use estimated d results to see how close power was pwr.t.test(n=100, d = solved$d[1]) pwr.t.test(n=50, d = solved$d[2]) pwr.t.test(n=25, d = solved$d[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() (not required; if accuracy is important ### PROBABLI should just be run longer) # confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm #------------------------------------------------ ##################### ## Criterion Analysis ##################### # GOAL: solve Type I error rate (alpha) given sample size, effect size, and # power inputs (inputs for root no longer required to be an integer). Only useful # when Type I error is less important than achieving the desired 1-beta (power) Design <- createDesign(N = 50, d = c(.2, .5, .8), sig.level = NA) Design # solve for NA's # all other function definitions same as above # search for alpha within [.0001, .8] solved <- SimSolve(design=Design, b = .8, interval=c(.0001, .8), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE) solved summary(solved) plot(solved, 1) plot(solved, 2) plot(solved, 3) # plot median history and estimate precision plot(solved, 1, type = 'history') plot(solved, 1, type = 'density') plot(solved, 1, type = 'iterations') # verify with true power from pwr package library(pwr) pwr.t.test(n=50, power = .8, d = .2, sig.level=NULL) pwr.t.test(n=50, power = .8, d = .5, sig.level=NULL) pwr.t.test(n=50, power = .8, d = .8, sig.level=NULL) # use estimated alpha results to see how close power was pwr.t.test(n=50, d = .2, sig.level=solved$sig.level[1]) pwr.t.test(n=50, d = .5, sig.level=solved$sig.level[2]) pwr.t.test(n=50, d = .8, sig.level=solved$sig.level[3]) ### failing analytic formula, confirm results with more precise ### simulation via runSimulation() (not required; if accuracy is important ### PROBABLI should just be run longer) # confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE, # generate=Generate, analyse=Analyse, # summarise=Summarise) # confirm ## End(Not run)
This collapses the simulation results within each condition to composite
estimates such as RMSE, bias, Type I error rates, coverage rates, etc. See the
See Also
section below for useful functions to be used within Summarise
.
Summarise(condition, results, fixed_objects)
Summarise(condition, results, fixed_objects)
condition |
a single row from the |
results |
a |
fixed_objects |
object passed down from |
for best results should return a named numeric
vector or data.frame
with the desired meta-simulation results. Named list
objects can also be returned,
however the subsequent results must be extracted via SimExtract
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
bias
, RMSE
, RE
, EDR
,
ECR
, MAE
, SimExtract
## Not run: summarise <- function(condition, results, fixed_objects) { #find results of interest here (alpha < .1, .05, .01) lessthan.05 <- EDR(results, alpha = .05) # return the results that will be appended to the design input ret <- c(lessthan.05=lessthan.05) ret } ## End(Not run)
## Not run: summarise <- function(condition, results, fixed_objects) { #find results of interest here (alpha < .1, .05, .01) lessthan.05 <- EDR(results, alpha = .05) # return the results that will be appended to the design input ret <- c(lessthan.05=lessthan.05) ret } ## End(Not run)
Format time input string into suitable numeric output metric (e.g., seconds).
Input follows the SBATCH
utility specifications.
Accepted time formats include "minutes"
,
"minutes:seconds"
, "hours:minutes:seconds"
,
"days-hours"
, "days-hours:minutes"
and
"days-hours:minutes:seconds"
.
timeFormater(time, output = "sec")
timeFormater(time, output = "sec")
time |
a character string to be formatted. If a numeric vector is supplied then this will be interpreted as seconds. |
output |
type of numeric output to convert time into.
Currently supported are |
For example, max_time = "60"
indicates a maximum time of 60 minutes,
max_time = "03:00:00"
a maximum time of 3 hours,
max_time = "4-12"
a maximum of 4 days and 12 hours, and
max_time = "2-02:30:00"
a maximum of 2 days, 2 hours and 30 minutes.
# Test cases (outputs in seconds) timeFormater("4-12") # day-hours timeFormater("4-12:15") # day-hours:minutes timeFormater("4-12:15:30") # day-hours:minutes:seconds timeFormater("30") # minutes timeFormater("30:30") # minutes:seconds timeFormater("4:30:30") # hours:minutes:seconds # output in hours timeFormater("4-12", output = 'hour') timeFormater("4-12:15", output = 'hour') timeFormater("4-12:15:30", output = 'hour') timeFormater("30", output = 'hour') timeFormater("30:30", output = 'hour') timeFormater("4:30:30", output = 'hour')
# Test cases (outputs in seconds) timeFormater("4-12") # day-hours timeFormater("4-12:15") # day-hours:minutes timeFormater("4-12:15:30") # day-hours:minutes:seconds timeFormater("30") # minutes timeFormater("30:30") # minutes:seconds timeFormater("4:30:30") # hours:minutes:seconds # output in hours timeFormater("4-12", output = 'hour') timeFormater("4-12:15", output = 'hour') timeFormater("4-12:15:30", output = 'hour') timeFormater("30", output = 'hour') timeFormater("30:30", output = 'hour') timeFormater("4:30:30", output = 'hour')