Checks if jobs were run with the same seed

Checks every log file inside a folder for the record of the used seed. Returns duplicated seeds and corresponding job arrays.

Usage

check_rep_seeds(logs_path)

Arguments

logs_path: Character with path to the folder containing logs. Only log files should be present, in plain text format.

Value

A data frame with four columns. Each line contains the information of one result with a duplicated seed. There will be no lines if there are no duplicated seeds in the any of the logs. Columns are as follows:

Data: A character vector with the name of the data set where duplicates were found.
Models: A numeric with corresponding array index. Will be empty if no duplicates were found.
Seeds: A numeric with the corresponding seed that was duplicated.
Array_indices: A numeric with corresponding array index.

Note

This function is the preferred method for checking for the presence of repeated seeds. However, it will fail if the log files were generated by older versions of the package, as it expects that the seed and array information are always at the same location. If you encounter issues, try running check_rep_seeds_depr() instead. For all other cases, give preference to this function as better optimization for log output parsing is possible.

Author

Pedro Santos Neves

Examples