Config

Tip

wr conf -h explains configuration as well.

Running wr conf --default will print an example config file which details all the config options available.

The default config should be fine for most people, but if you want to change something, run wr conf --default > ~/.wr_config.yml and make changes to that (or save it as one of the other possible config files). Alternatively, export environment variables.

Tip

If you’ll be using OpenStack, it is strongly recommended to configure that database backups go to S3 by setting managerdbbkfile: s3://yourbucket/yourpath.

Running just wr conf will show you your currently configured values, and remind you where that configurtion is set (which files, or if it is set in an environment variable, or if it’s a default).

Config files

wr will load its configuration settings from one or more files named .wr_config[.production|.development].yml found in these directories, in order of precedence:

The current directory
Your home directory
The directory pointed to by the environment variable $WR_CONFIG_DIR

.wr_config.yml files are always read, and can be used to define settings common to both production and development deployments.

.wr_config.development.yml files are only read in a development context:: Either a --deployment development option has been passed to the wr executable, or the environment variable $WR_DEPLOYMENT has been set to ‘development’. Or the current working directory contains a checkout of the wr git repository.
.wr_config.production.yml files are only read in a production context:: Either a --deployment production option has been passed to the wr executable, or the environment variable $WR_DEPLOYMENT has been set to ‘production’. Or we are not in a development context, where we default to production.

Note

If you use config files, these must be readable by all nodes; when you don’t have a shared disk, it’s best to configure using environment variables.

In cloud deployments where wr itself creates compute nodes, a config file will be created on new nodes automatically.

Environment variables

You can set and override config settings by defining environment variables named like: WR_<setting name in caps>. Eg. to define the managerscheduler option you might do:

export WR_MANAGERSCHEDULER="lsf"

Note

Environment variables will apply to both deployments, so you shouldn’t set options that must be different between them, such as ports, unless you’re careful to also change WR_DEPLOYMENT at the same time, or you will only ever be running one of the deployments.

Ports

The wr manager needs 2 ports to operate, one for the wr executable, one for the web interface and REST API. By default, it will calculate 4 ports based on your uid (2 for each deployment), so that different people starting the manager on the same machine will get different ports.

This calculation will fail if you have a very high uid. In this case, when you start the manager it will prompt you to accept random ports that are currently available, and write these in to your config file.

Alternatively you could manually set managerport and managerweb in your config files (you must use different ones for your development and production deployments) to desired port numbers yourself.

Note

If you use an integration like Nextflow where it by default guesses your ports, you’ll need to instead configure Nexflow to use the same ports that are now specified in your wr config file.

Performance considerations

For the most part, you should be able to throw as many jobs at wr as you like, running on as many compute nodes as you have available, and trust that wr will cope. There are no performance-related parameters to fiddle with: fast mode is always on!

However you should be aware that wr’s performance will typically be limited by that of the disk you configure wr’s database to be stored on (by default it is stored in your home directory), since to ensure that workflows don’t break and recovery is possible after crashes or power outages, every time you add jobs to wr, and every time you finish running a job, before the operation completes it must wait for the job state to be persisted to disk in the database.

This means that in extreme edge cases, eg. you’re trying to run thousands of jobs in parallel, each of which completes in milliseconds, each of which want to add new jobs to the system, you could become limited by disk performance if you’re using old or slow hardware.

You’re unlikely to see any performance degradation even in extreme edge cases if using an SSD and a modern disk controller. Even an NFS mount could give more than acceptable performance.

But an old spinning disk or an old disk controller (eg. limited to 100MB/s) could cause things to slow to a crawl in this edge case. “High performance” disk systems like Lustre should also be avoided, since these tend to have incredibly bad performance when dealing with many tiny writes to small files.

Tip

Set the managerdbfile option in your config file to a path on a fast disk.

If this is the only hardware you have available to you, you can half the impact of disk performance by reorganising your workflow such that you add all your jobs in a single wr add call, instead of calling wr add many times with subsets of those jobs.