Parameter space exploration
Parameter space exploration allows to easily try different sets of parameters, measure their impact on analysis performance and guide method optimization for new matrices, targets, or databases.
Provided a grid of parameter combination, foodme will be run several time with a different paramter set. Performance metrics will be extracted for each run using using a expected sample composition and the metrics for each run will be aggregated.
Importantly you will need to provide a Foodme configuration file (see configuration) containing default values and a parameter grid containing a set of parameters to update on each row.
Parameter grid
The parameter grid should be organized as a tab-delimited text file where the first row contains the name of the parameters to vary. Each row then contains a set of parameter that will be used to update the default configuration.
For example with the file below, three independent foodme runs will be triggered with ASV clustering, dereplication, and 97% identity clustering. The other parameters will be taken for the default configuration file.
cluster_method | cluster_identity |
---|---|
asv | 1 |
otu | 1 |
otu | 0.97 |
Configuration
The parameter space exploration mode only requires a few argument, that you can either pass to snakemake
as a YAML configuration file (see the template in config/
or directly through the command line with
the --config
argument.
Parameter | Expected values | Description |
---|---|---|
workdir |
Path | Path to the output directory, will be created if it doesn´t exist |
foodme_config |
Path | Path to the foodme configuration file. |
paramspace |
Path | Path to the parameter grid |
force_rerun |
True/False | Whether to force a recalculation of the foodme results for all parameter combinations. Equivalent to the --forceall directive. |
Note
As parameter space exploration only makes sense if you can compare the results
to known sample compositions, it is required to provide values for the benchmark_*
parameters. These values can be part of the parameter space exploration too!
Warning
In the parameter space mode, the foodme configuration file only supports absolute file paths. Relative paths will result in errors, you've been warned.
Running a parameter space analysis
The parameter space analysis is organized in a separate workflow from foodme, it is therefore nescessary to point snakemake to the correct workflow definition:
conda activate snakemake
snakemake -s ~/FooDMe/workflow/paramspace \
--use-conda --conda-prefix ~/conda-envs --cores 1 \
--configfile ~/FooDMe/config/myconfig.yaml
Warning
The command --forceall
will not force a re-run of foodme.
To trigger a rerun you will have to set the force_rerun
parameter
to True
in the configuration or manually delete the foodme_runs
folder.
Results
The parameter space exploration will produce an aggregate of foodme benchmark
results: yields, metrics, PR-curves, and confusion matrices, all with indication
of the specific parameter set that was used for each run.
Additonally the ressource usage (CPU, memory, I/O) will be measured for each foodme run and saved as a table. See also snakemake's documentation.
The output is organized as follows: