--- title: "Managing warning and error messages" author: "Phil Chalmers" date: "`r Sys.Date()`" output: html_document: fig_caption: false number_sections: true toc: true toc_float: collapsed: false smooth_scroll: false vignette: > %\VignetteIndexEntry{Managing warning and error messages} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r nomessages, echo = FALSE} knitr::opts_chunk$set( warning = FALSE, message = FALSE, fig.height = 5, fig.width = 5 ) options(digits=4) par(mar=c(3,3,1,1)+.1) ``` Error catching is an important area to consider when creating Monte Carlo simulations. Sometimes, iterative algorithms will 'fail to converge', or otherwise crash for other reasons (e.g., sparse data). Moreover, errors may happen in unexpected combinations of the design factors under investigation, which can lead to abrupt termination of a simulation's execution. `SimDesign` makes managing errors much easier because the internal functions are automatically wrapped within `try` blocks, and therefore simulations will not terminate unexpectedly. This type of information is also collected in the final simulation object since it may be relevant to the writer that something unknown is going wrong in the code-base. Below we demonstrate what happens when errors are thrown and caught, and how this information is tracked in the returned object. # Define the functions As usual, define the functions of interest. ```{r echo=FALSE} set.seed(1) ``` ```{r} library(SimDesign) # SimFunctions(comments=FALSE) Design <- createDesign(N = c(10,20,30)) ``` ```{r} Generate <- function(condition, fixed_objects) { ret <- with(condition, rnorm(N)) ret } Analyse <- function(condition, dat, fixed_objects) { whc <- sample(c(0,1,2,3), 1, prob = c(.7, .20, .05, .05)) if(whc == 0){ ret <- mean(dat) } else if(whc == 1){ ret <- t.test() # missing arguments } else if(whc == 2){ ret <- t.test('invalid') # invalid arguments } else if(whc == 3){ # throw error manually stop('Manual error thrown') } # manual warnings if(sample(c(TRUE, FALSE), 1, prob = c(.1, .9))) warning('This warning happens rarely') if(sample(c(TRUE, FALSE), 1, prob = c(.5, .5))) warning('This warning happens much more often') ret } Summarise <- function(condition, results, fixed_objects) { ret <- c(bias = bias(results, 0)) ret } ``` The above simulation is just an example of how errors are tracked in `SimDesign`, as well as how to throw a manual error in case the data should be re-drawn based on the user's decision (e.g., when a model converges, but fails to do so before some number of predefined iterations). # Run the simulation ```{r include=FALSE} set.seed(1) ``` ```{r } result <- runSimulation(Design, replications = 100, generate=Generate, analyse=Analyse, summarise=Summarise) ``` ```{r} print(result) ``` What you'll immediately notice from this output object is that counts of the error and warning messages have been appended to the `result` object. This is useful to determine just how problematic the errors and warnings are based on their frequency alone. Furthermore, the specific frequency in which the errors/warnings occurred are also included for each design condition (here the `t.test.default()` error, where no inputs were supplied, occurred more often than the manually thrown error as well as the invalid-input error) after extracting and inspecting `SimExtract(results, what = 'errors')` and `SimExtract(results, what = 'warnings')`. ```{r} SimExtract(result, what = 'errors') ``` Finally, `SimDesign` has a built-in safety feature controlled by with `max_errors` argument to avoid getting stuck in infinite redrawing loops. By default, if more than 50 errors are consecutively returned then the simulation condition will be halted, and a warning message will be printed to the console indicating the last observed fatal error. These safety features are built-in because too many consecutive `stop()` calls generally indicates a major problem in the simulation code which should be fixed before continuing. However, when encountering fatal errors in a given simulation condition the remainder of the simulation experiment will still be executed as normal, where for the problematic conditions combinations `NA` placeholders will be assigned to these rows in the final output object. This is so that the entire experiment does not unexpectedly terminate due to one or more problematic row conditions in `Design`, and instead these conditions can be inspected and debugged at a later time. Of course, if inspecting the code directly, the simulation could be manually halted so that these terminal errors can be attended to immediately (e.g., using `Ctrl + c`, or clicking the 'Stop' icon in Rstudio). # What to do (explicit debugging) If errors occur too often (but not in a fatal way) then the respective design conditions should either be extracted out of the simulation or further inspected to determine if they can be fixed (e.g., providing better starting values, increasing convergence criteria/number of iterations, etc). For instance, say that the fourth row of the `design` object raised a number of error messages that should be inspected further. One useful approach then would be to debug the 4th row on the instance that an error is raised, which can be achieved using the following: ```{r eval=FALSE} runSimulation(..., debug = 'error-4') ``` The `error` flag is used to enter R's debugger on the first instance of an error, while the `-4` indicates that only the 4th row of `design` should be evaluated. This is also one instance where changing warning messages into error messages (i.e., `runSimulation(..., extra_options = list(warnings_as_errors=TRUE))`) is particularly useful so that the state that generated a warning can be inspected directly. Note that similar arguments can be made for explicitly debugging functions in the generate-analyse-summarise chain (e.g., `debug = 'analyse-4'`), though these are less useful for debugging (more useful for initial code design). ## Manual debugging via `try()` Failing the above approach, manually wrapping the problematic functions in a `try()` call. Adding the line `if(is(object, 'try-error')) browser()` will jump into the location/replication where the object unexpectedly witnessed, though admittedly this is more clunky approach than using `debug`. Nevertheless, jumping into the exact location where the error occurred, particularly in the case where an `analyse()` function is throwing multiple error messages, will greatly help you determine what exactly went wrong in the simulation state, allowing you to quickly locate and fix the issue. ## Extracting error seeds for hard-to-find bugs An alternative approach to locating errors in general is to use information stored within the `SimDesign` objects at the time of completion. By default, all `.Random.seed` states associated with errors are stored within the final object, and these can be extracted using the `SimExtract(..., what='error_seeds')` option. This function returns a `data.frame` object with each seed stored column-wise, where the associated error message is contained in the column name itself (and allowed to be coerced into a valid column name to make it easier to use the `$` operator). For example, ```{r} seeds <- SimExtract(result, what = 'error_seeds') head(seeds[,1:3]) ``` Given these seeds, replicating an exact error can be achieved by a) extracting a single column into an `integer` vector, and b) passing this vector to the `load_seed` input. For example, replicating the first error message can be achieved as follows, where it makes the most sense to immediately go into the debugging mode via the `debug` inputs. **Note**: It is important to manually select the correct `Design` row using this error extraction approach; otherwise, the seed will clearly not replicate the exact problem state. ```{r eval=FALSE} picked_seed <- seeds$Design_row_1.1..Error.in.t.test.default..invalid.....not.enough..x..observations. # debug analyse() for first row of Design object via debug='analyse-1' runSimulation(Design, replications = 100, load_seed=picked_seed, debug='analyse-1', generate=Generate, analyse=Analyse, summarise=Summarise) ``` The `.Random.seed` state will be loaded at this exact state, and will always be related at this state as well (in case `c` is typed in the debugger, or somehow the error is harder to find while walking through the debug mode). Hence, users must type `Q` to exit the debugger after they have better understood the nature of the error message first-hand. # Converting warings to errors explicitly ```{r include=FALSE} knitr::opts_chunk$set( warning = TRUE ) ``` On occasion functions will return `warning` message that either border on (or should be treated as) error messages if they influence the veracity of the simulation results. Such examples may include models that appear to 'converge' but do so with non-nonsensical parameter estimates (e.g., negative variances, non-positive definite correlation matrices, maximum iterations reached in an numerical searching algorithm, etc). However, because such issues are non-fatal third-party software (i.e., functions not written by the developer of the simulation) may simply raise a `warning` message for further inspection, whereas in a Monte Carlo simulation experiment such issues should generally be treated as invalid (though their frequency of occurrence should still be tracked, as is the default in `SimDesign`). To resolve this issue, and to avoid using a more nuclear option such as setting `option(warn=2)` to treat *all* warnings as errors in the simulation, the function `manageWarnings()` can be used to explicit convert known warning message strings into errors that disrupt the code execution while allowing other warning messages to continue to be raised. For example, if a function utilized in a simulation was ```{r} myfun <- function() { if(sample(c(TRUE, FALSE), 1, prob = c(.1, .9))) warning('This warning is serious') if(sample(c(TRUE, FALSE), 1, prob = c(.5, .5))) warning('This warning is no big deal') return(1) } set.seed(1) out <- myfun() set.seed(2) out <- myfun() set.seed(7) out <- myfun() ``` then whenever the serious warning message is raised it could be explicitly converted to an error using an internal `grepl()` test. ```{r error=TRUE} set.seed(1) out1 <- manageWarnings(myfun(), warning2error='This warning is serious') out1 set.seed(2) out2 <- manageWarnings(myfun(), warning2error='This warning is serious') out2 set.seed(7) out3 <- manageWarnings(myfun(), warning2error='This warning is serious') ``` which now converts the previous warning message into an error message, thereby correctly disrupting the flow of the Monte Carlo simulation experiment and prompting a new call to `Generate()`. Of course, all warning and error messages are tallied in the resulting `runSimulation()` object, though now the serious warnings will be tallied as disruptive errors instead of warnings that continued normally.