RF Eval allows evaluating the agreement between a given (set of) secondary structure(s) and a (set of) XML reactivity files.
Reference structures can be provided either in Vienna format (dot-bracket notation), or in CT format. A single file containing the structure for multiple transcripts can be provided:

# Vienna format

>Transcript#1
AAAAAAAAAAAAAAAAAAAAUUUUUUUUUUUUUUUUUUUUU
.((((((((((((((((((....))))))))))))))))))
>Transcript#2
CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGG
(((((((((((((((((...)))))))))))))))))
>Transcript#3
GCUAGCUAGCUAGCUAGCUAGUCAAGACGAGUCGAUGCU
(((((((((....))))))))).................

Important

The IDs of the provided structures must match the file name of the reactivity XML file (e.g. "Transcript#1" expects an XML file named "Transcript#1.xml")


Metrics

RF Eval computes 3 metrics of agreement between reactivity data and structure. All 3 metrics yield values comprised between 0 and 1, with 0 representing 0% agreement and 1 representing 100% agreement.

[1] Unpaired coefficient

This is the simplest metric and it measures the fraction of highly reactive bases (bases whose reactivity exceeds a user-defined threshold t) that are unpaired in the secondary structure:

C=i=1k1if iu0if ip


where k is the set of bases having reactivity > t, while u and p are respectively the sets of unpaired and paired bases in the structure.

[2] Data-Structure Correlation Index (DSCI)

This metric was originally proposed by Lan et al., 2021 (doi: 10.1101/2020.06.29.178343) and it is closely related to the Mann-Whitney U statistic. The DSCI is defined as the probability that a randomly chosen unpaired base will have greater reactivity than a randomly chosen paired base:

DCSI=1mni=1mj=1n1if pi<uj0if piuj


where p is the set of reactivities for all m paired bases, while u is the set of reactivities for all n unpaired bases.

[3] Area Under the Receiver Operating Characteristic Curve (AUROC)

This metric is typically employed to assess the performance of a binary classifier model at varying threshold values.
Briefly, the reactivity threshold t is slowly increased from 0 to 1, in 0.005 increments. At each threshold, the True Positive Rate (TPR) is calculated as:

TPR=TPP


where TP is the number of unpaired bases whose reactivity ≥ t, and P is the total number of unpaired bases in the structure.
The True Negative Rate (TNR) is instead calculated as:

TNR=TNN


where TN is the number of paired bases whose reactivity ≥ t, and N is the total number of paired bases in the structure.

The AUROC is then defined as the area underlying the curve described by the set of FPR-TPR value pairs at each value of t.

AUROC

Usage

To list the required parameters, simply type:

$ rf-eval -h
Parameter Type Description
-s or --structures string Path to a (folder of) structure file(s)
-r or --reactivities string Path to a (folder of) XML reactivity file(s)
-o or --output string Output file with metrics per transcript (Default: rf_eval.txt)
-ow or --overwrite Overwrites output file (if the specified file already exists)
-p or --processors int Number of processors to use (≥1, Default: 1)
-tu or --terminal-as-unpaired Treats terminal base-pairs as if they were unpaired
Note: this parameter and -it are mutually exclusive
-it or --ignore_terminal Terminal base-pairs are excluded from calculations
Note: this parameter and -tu are mutually exclusive
-kl or --keep-lonelypairs Lonely base-pairs (helices of 1 bp) are retained
-kp or --keep-pseudoknots Pseudoknotted base-pairs are retained
-c or --reactivity-cutoff Cutoff for considering a base highly-reactive when computing the unpaired coefficient (>0, Default: 0.7)