RF Eval allows evaluating the agreement between a given (set of) secondary structure(s) and a (set of) XML reactivity files.
Reference structures can be provided either in Vienna format (dotbracket notation), or in CT format. A single file containing the structure for multiple transcripts can be provided:
# Vienna format
>Transcript#1
AAAAAAAAAAAAAAAAAAAAUUUUUUUUUUUUUUUUUUUUU
.((((((((((((((((((....))))))))))))))))))
>Transcript#2
CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGG
(((((((((((((((((...)))))))))))))))))
>Transcript#3
GCUAGCUAGCUAGCUAGCUAGUCAAGACGAGUCGAUGCU
(((((((((....))))))))).................
Important
The IDs of the provided structures must match the file name of the reactivity XML file (e.g. "Transcript#1" expects an XML file named "Transcript#1.xml")
Metrics
RF Eval computes 3 metrics of agreement between reactivity data and structure. All 3 metrics yield values comprised between 0 and 1, with 0 representing 0% agreement and 1 representing 100% agreement.
[1] Unpaired coefficient
This is the simplest metric and it measures the fraction of highly reactive bases (bases whose reactivity exceeds a userdefined threshold t) that are unpaired in the secondary structure:
where k is the set of bases having reactivity > t, while u and p are respectively the sets of unpaired and paired bases in the structure.
[2] DataStructure Correlation Index (DSCI)
This metric was originally proposed by Lan et al., 2021 (doi: 10.1101/2020.06.29.178343) and it is closely related to the MannWhitney U statistic. The DSCI is defined as the probability that a randomly chosen unpaired base will have greater reactivity than a randomly chosen paired base:
where p is the set of reactivities for all m paired bases, while u is the set of reactivities for all n unpaired bases.
[3] Area Under the Receiver Operating Characteristic Curve (AUROC)
This metric is typically employed to assess the performance of a binary classifier model at varying threshold values.
Briefly, the reactivity threshold t is slowly increased from 0 to 1, in 0.005 increments. At each threshold, the True Positive Rate (TPR) is calculated as:
where TP is the number of unpaired bases whose reactivity ≥ t, and P is the total number of unpaired bases in the structure.
The True Negative Rate (TNR) is instead calculated as:
where TN is the number of paired bases whose reactivity ≥ t, and N is the total number of paired bases in the structure.
The AUROC is then defined as the area underlying the curve described by the set of FPRTPR value pairs at each value of t.
Usage
To list the required parameters, simply type:
$ rfeval h
Parameter  Type  Description 

s or structures  string  Path to a (folder of) structure file(s) 
r or reactivities  string  Path to a (folder of) XML reactivity file(s) 
o or output  string  Output file with metrics per transcript (Default: rf_eval.txt) 
ow or overwrite  Overwrites output file (if the specified file already exists)  
p or processors  int  Number of processors to use (≥1, Default: 1) 
tu or terminalasunpaired  Treats terminal basepairs as if they were unpaired Note: this parameter and it are mutually exclusive 

it or ignore_terminal  Terminal basepairs are excluded from calculations Note: this parameter and tu are mutually exclusive 

kl or keeplonelypairs  Lonely basepairs (helices of 1 bp) are retained  
kp or keeppseudoknots  Pseudoknotted basepairs are retained  
c or reactivitycutoff  Cutoff for considering a base highlyreactive when computing the unpaired coefficient (>0, Default: 0.7) 