The RF JackKnife takes one or more XML reactivity files, and a set of reference RNA structures in dotbracket notation, and iteratively calls rf-fold
by tuning the slope and intercept folding parameters. This is useful to calibrate the folding parameters for a specific probing reagent or experiment type.
It produces a CSV table the FMI (Fowlkes-Mallows index, the geometric mean of PPV and sensitivity), or the mFMI (modified FMI, for additional details check the Metrics section of RF Compare) for each slope/intercept pair:
Combining multiple experiments
Since version 2.9.0 it is possible to identify a single slope/intercept pair that yields the best prediction across multiple experiments.
In the follwing example:
$ rf-jackknife -x -r reference.db experiment_1/ experiment_2/ experiment_3/
slope and intercept will be optimized on the reference transcripts present in reference.db
. All transcripts, including those that are present only in a subset of the experiments, will by default be used for parameter optimization. If parameter -oc
is enabled, however, only transcripts for which reactivity data is available across all experiments will be used.
For additional details on how multiple replicates are combined into a single prediction, please refer to the "Combining multiple experiments" paragraph of the RF Fold's documentation page.
Usage
To list the required parameters, simply type:
$ rf-jackknife -h
Parameter | Type | Description |
---|---|---|
-r or --reference | string | A file containing reference structures in Vienna format (dotbracket notation) |
-oc or --only-common | In case of replicates, only transcripts covered across all experiments will be used to derive the optimal slope/intercept pair | |
-p or --processors | int | Number of processors to use (Default: 1) |
-o or --output-dir | string | Output directory (Default: rf_jackknife/) |
-ow or --overwrite | Overwrites output directory (if the specified path already exists) | |
-g or --img | Generates heatmap of grid search results (requires R) | |
-sl or --slope | float,float | Range of slope values to test (Default: 0,5) |
-in or --intercept | float,float | Range of intercept values to test (Default: -3,0) |
-ss or --slope-step | float | Step for testing slope values (Default: 0.2) |
-is or --intercept-step | float | Step for testing intercept values (Default: 0.2) |
-x or --relaxed | Uses relaxed criteria (Deigan et al., 2009) to calculate the FMI | |
-m or --mFMI | Uses modified FMI (mFMI, Lan et al., 2022; for additional details check the Metrics section of RF Compare) instead of standard FMI to quantify the agreement between predicted and reference structure | |
-kp or --keep-pseudoknots | Keeps pseudoknotted basepairs in reference structure | |
-kl or --keep-lonelypairs | Keeps lonely basepairs (helices of length 1 bp) in reference structure | |
-i or --ignore-sequence | Ignores sequence differences (e.g. SNVs) between the compared structures | |
-e or --median | The FMI across multiple reference structures is aggregated by median Note: by default, FMI values are aggregated by geometric mean |
|
-am or --arithmetic-mean | The FMI across multiple reference structures is aggregated by arithmetic mean Note: by default, FMI values are aggregated by geometric mean |
|
-rf or --rf-fold | string | Path to rf-fold executable (Default: assumes rf-fold is in PATH) |
-rp or --rf-fold-params | string | Manually specify additional RF Fold parameters (e.g. -rp "-md 500 -m 2") |
-R or --R-path | string | Path to R executable (Default: assumes R is in PATH) Note: also check $RF_RPATH under Environment variables |
Output CSV files
RF JackKnife produces a CSV file reporting the FMI (or mFMI) for each intercept (x-axis) and slope (y-axis) value pair