The RF JackKnife takes one or more XML reactivity files, and a set of reference RNA structures in dotbracket notation, and iteratively calls rf-fold by tuning the slope and intercept folding parameters. This is useful to calibrate the folding parameters for a specific probing reagent or experiment type.
It produces a CSV table the FMI (Fowlkes-Mallows index, the geometric mean of PPV and sensitivity), or the mFMI (modified FMI, for additional details check the Metrics section of RF Compare) for each slope/intercept pair:
Combining multiple experiments
Since version 2.9.0 it is possible to identify a single slope/intercept pair that yields the best prediction across multiple experiments.
In the follwing example:
$ rf-jackknife -x -r reference.db experiment_1/ experiment_2/ experiment_3/
slope and intercept will be optimized on the reference transcripts present in reference.db. All transcripts, including those that are present only in a subset of the experiments, will by default be used for parameter optimization. If parameter -oc is enabled, however, only transcripts for which reactivity data is available across all experiments will be used.
For additional details on how multiple replicates are combined into a single prediction, please refer to the "Combining multiple experiments" paragraph of the RF Fold's documentation page.
Usage
$ rf-jackknife [options] XML_dir_1/ XML_dir_2/ .. XML_dir_N/
$ rf-jackknife [options] rna1.xml rna2.xml .. rnaN.xml
To list all available parameters, simply type:
$ rf-jackknife -h
| Parameter | Type | Description |
|---|---|---|
| -r or --reference | string | A file containing reference structures in Vienna format (dotbracket notation) |
| -oc or --only-common | In case of replicates, only transcripts covered across all experiments will be used to derive the optimal slope/intercept pair | |
| -p or --processors | int | Number of processors to use (Default: 1) |
| -o or --output-dir | string | Output directory (Default: rf_jackknife/) |
| -ow or --overwrite | Overwrites output directory (if the specified path already exists) | |
| -g or --img | Generates heatmap of grid search results (requires R) | |
| -sl or --slope | float,float | Range of slope values to test (Default: 0,5) |
| -in or --intercept | float,float | Range of intercept values to test (Default: -3,0) |
| -ss or --slope-step | float | Step for testing slope values (Default: 0.2) |
| -is or --intercept-step | float | Step for testing intercept values (Default: 0.2) |
| -x or --relaxed | Uses relaxed criteria (Deigan et al., 2009) to calculate the FMI | |
| -m or --mFMI | Uses modified FMI (mFMI, Lan et al., 2022; for additional details check the Metrics section of RF Compare) instead of standard FMI to quantify the agreement between predicted and reference structure | |
| -kp or --keep-pseudoknots | Keeps pseudoknotted basepairs in reference structure | |
| -kl or --keep-lonelypairs | Keeps lonely basepairs (helices of length 1 bp) in reference structure | |
| -i or --ignore-sequence | Ignores sequence differences (e.g. SNVs) between the compared structures | |
| -e or --median | The FMI across multiple reference structures is aggregated by median Note: by default, FMI values are aggregated by geometric mean |
|
| -am or --arithmetic-mean | The FMI across multiple reference structures is aggregated by arithmetic mean Note: by default, FMI values are aggregated by geometric mean |
|
| -d or --decimals | int | Number of decimals for reporting FMI/mFMI (1-10, Default: 3) |
| -rf or --rf-fold | string | Path to rf-fold executable (Default: assumes rf-fold is in PATH) |
| -rp or --rf-fold-params | string | Manually specify additional RF Fold parameters (e.g. -rp "-md 500 -m 2") |
| -R or --R-path | string | Path to R executable (Default: assumes R is in PATH) Note: also check $RF_RPATH under Environment variables |
Output CSV files
RF JackKnife produces a CSV file reporting the FMI (or mFMI) for each intercept (x-axis) and slope (y-axis) value pair