RF Compare allows comparing RF Fold-inferred secondary structures, with a reference of known secondary structures, reporting for each comparison 4 metrics: the PPV, the sensitivity, the FMI (Fowlkes-Mallows index) and the mFMI (modified FMI). For additional details, check the Metrics section below.
Reference structures can be provided both in Vienna (dot-bracket), or in CT format. Since version 2.8.8, reference structures can be provided both as a single file containing multiple structures, or as a folder of individual structure files.
The sequence ID of the reference structures must match the compared file's name (e.g. "Transcript#1" expects a file named "Transcript#1.ct" or "Transcript#1.db").
RF Compare can be invoked both on a single structure, or on an entire folder of RF Fold-predicted structure files. Structures can be provided either in CT or Vienna (dot-bracket) format.
RF Compare can further generates PDF graphical comparisons for each structure with respect to its reference:
Metrics
Given a reference and a predicted structure as input, RF Compare calculates 4 metrics. Each metric ranges between 0 (more dissimilar structures) and 1 (more similar structures):
Positive Predictive Value (PPV)
The fraction of base-pairs present in the predicted structure that are also present in the reference structure
Sensitivity
The fraction of base-pairs present in the reference structure that are also present in the predicted structure
Fowlkes-Mallows index (FMI)
The geometric mean of PPV and sensitivity (introduced by Deigan et al., 2009, PMID:19109441):
where Pboth is the number of base-pairs common to both reference and predicted structure, while Pref and Ppred are the numbers of base-pairs respectively unique to reference and predicted structures.
Modified Fowlkes-Mallows index (mFMI)
A variant of the FMI (introduced by Lan et al., 2022, PMID:35236847), which also rewards bases that are unpaired both in the reference and in the predicted structure:
where u is the number of unpaired bases common to both reference and predicted structure.
Usage
$ rf-compare [options] -r reference.db structures.db
$ rf-compare [options] -r reference_structs/ structures.db
$ rf-compare [options] -r reference.db structures/
To list all available parameters, simply type:
$ rf-compare -h
| Parameter | Type | Description |
|---|---|---|
| -p or --processors | int | Number of processors to use (≥ 1; Default: 1) |
| -r or --reference | string | Path to (a folder) of structure file(s) Note: files containing multiple structures are accepted |
| -g or --img | Enables generation of secondary structure comparison plots (requires R) | |
| -o or --output-dir | string | Images output directory (Default: rf_compare/, requires -g) |
| -ow or --overwrite | Overwrites output directory (if the specified path already exists) | |
| -x or --relaxed | Uses relaxed criteria (described in Deigan et al., 2009) to calculate PPV and sensitivity | |
| -kp or --keep-pseudoknots | Keeps pseudoknotted basepairs in reference structure | |
| -kl or --keep-lonelypairs | Keeps isolated base-pairs (helices of length 1 bp) in reference structure | |
| -i or --ignore-sequence | Ignores sequence differences (e.g. SNVs) between the compared structures | |
| -R or --R-path | string | Path to R executable (Default: assumes R is in PATH) Note: also check $RF_RPATH under Environment variables |
Note
When parameter --relaxed is specified, a basepair i-j is considered to be present in the reference structure if any of the following pairs exist: i/j; i-1/j; i+1/j; i/j-1; i/j+1. For additional details, please refer to Deigan et al., 2009 (PMID: 19109441)