Configuration

Parameters can be freely set to fit specific analytical needs. Different sets of parameters should be saved to configuration files in YAML format. This ensures reproducible analysis of different samples over time.

A template of such a configuration file is stored in the repository under FooDME/config/config.yaml.

We include an optimized configuration file for use in 16S meat metabarcoding experiments with the Dobrovolny et al. (2019) method. The config folder will be populated with configuration sets for other matrices when possible. Feel free to submit yours!

Warning

Path to reference files are system dependent and will still need to be changed even for preset configurations.

How to use to configuration file

Modify the values of each parameters as you need (see table below). Then save your own configuration locally (for example under FooDme/config/). This configuration can be reused for successive analysis.

Sample sheet

The input files must be linked using a sample sheet. A template for such a file is available under FooDME/config/samples.tsv. and takes the following form:

sample	fq1	fq2
nameA	A_R1.fastq.gz	A_R2.fastq.gz
nameB	B_R1.fastq.gz	B_R2.fastq.gz

Simply modify the template with your own files or use the provided script to automatically generate a sample sheet from FASTQ files in a folder:

bash ~/FooDMe/ressources/create_sampleSheet.sh --mode illumina --fastxDir ~/raw_data

This will create a file called samples.tsv in the raw_data folder.

Note

The above command assumes that FASTQ files are named according to Illumina naming standards. Different naming standards are available. Use --help to see more options.

Info

This tool was developed by the Federal institute of risk assessment (BfR) in Berlin. More information is available in their repository.

List of parameters

Parameter	Expected values	Description
`workdir`	Path	Path to the output directory, will be created if it doesn´t exist
`samples`	Path	Path to the sample sheet
`threads_sample`	Number	Number of threads assigned to each job
`threads`	Number	Number of threads assigned to the workflow
`primers_fasta`	Path	Path to the fasta file containing primer sequences. IUPAC ambiguous nuclotides are accepted.
`blast_DB`	Path	Path to the BLAST database in the form `path/to/folder/db-name`
`taxdb`	Path	Path to the folder containing the `taxdb`files
`rankedlineage_dmp`	Path	Path to the `rankedlineage.dmp` file from the `taxdump` archive
`nodes_dmp`	Path	Path to the `nodes.dmp` file from the `taxdump` archive
`read_length_required`	Number	Minimal length of the reads after primer trimming
`qualifier-quality-phred`	Number	Minimal quality value per nucleotide
`qctrim_window-size`	Number	Size of the sliding window for 3´ quality trimming
`qctrim_mean_quality`	Number	Minimal quality thresold for sliding average
`trim_primers_3end`	True/False	Should primers be trimmed on the 3´ end of the reads? Only relevant if the sequencing length is larger than the amplicon length
`primer_error_rate`	Number [0, 1]	Maximum error-rate allowed for primer matching
`amplicon_min_length`	Number	Minimal length of the merge reads or ASV sequences
`amplicon_max_length`	Number	Maximal lenght of the merge reads or ASV sequences
`max_expected_errors`	Number	Maximum number of expected errors allowed in merged reads (OTUs) or trimmed reads (ASVs)
`max_ns`	Number	Maximal number of undetermined `N` nucleotide per sequence. This will automatically be set to 0 for ASVs
`cluster_method`	`otu` or `asv`	Clustering method
`cluster_identity`	Number [0, 1]	OTU identity threshold. Only for OTU
`cluster_minsize`	Number	Minimal size of clusters to keep
`merging_max_mismatch`	Number	Maximum number of mismatch allowed in the overlap between reads. Only for ASV
`remove_chimera`	True/False	Should predicted chimeric sequences be removed?
`min_consensus`	Number [0.51, 1]	Minimal agreement for taxonomic consensus determination. 0.51 is a majority vote and 1.0 is a last common ancestor determination
`taxid_filter`	Taxonomic identifier	Node under which to perform the BLAST search. Equivalent to pruning the taxonomy above this node. Use the Root Node number to keep the entire taxonomy
`blocklist`	`extinct` or custom path	Path to a list of taxonomic identifier to exclude from the BLAST search
`seq_blocklist`	`None` or custom path	Path to a list of sequence accessions (e.g. `NC_0016400`) to exclude from the results
`blast_filter_low_complexity`	True/False	Wether to mask low-complexity regions in the BLAST search. On by default, deactivate if you expect barcode sequences with low complexity.
`blast_evalue`	Number (scientific)	Minimal E-value threshold for the BLAST search
`blast_identity`	Number [0, 100]	Minimal identity (in percent) between the query and hit sequence for the BLAST search
`blast_qcov`	Number [0, 100]	Percent of the query to be covered by the hit sequence for the BLAST search
`bit_score_diff`	Number	Maximum bit-score difference between the best and last hit to keep for each query after the BLAST search
`benchmark_reference`	Path	Path to benchmarking reference table (see Benchmark mode).
`benchmark_threshold`	Number [0, 100]	Lower limit for benchmarking (see Benchmark mode).
`benchmark_rank`	String	Highest rank for benchmarking (see Benchmark mode).

Note

Unless you are running foodme in benchmark mode, you do not need to modify the values of the benchmark_* arguments. See Benchmark mode for more details.