RF Eval allows evaluating the agreement between a given (set of) secondary structure(s) and a (set of) XML reactivity files.
Reference structures can be provided either in Vienna format (dot-bracket notation), or in CT format. A single file containing the structure for multiple transcripts can be provided:

# Vienna format

>Transcript#1
AAAAAAAAAAAAAAAAAAAAUUUUUUUUUUUUUUUUUUUUU
.((((((((((((((((((....))))))))))))))))))
>Transcript#2
CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGG
(((((((((((((((((...)))))))))))))))))
>Transcript#3
GCUAGCUAGCUAGCUAGCUAGUCAAGACGAGUCGAUGCU
(((((((((....))))))))).................

Important

The IDs of the provided structures must match the file name of the reactivity XML file (e.g. "Transcript#1" expects an XML file named "Transcript#1.xml")

Metrics

RF Eval computes 3 metrics of agreement between reactivity data and structure. All 3 metrics yield values comprised between 0 and 1, with 0 representing 0% agreement and 1 representing 100% agreement.

[1] Unpaired coefficient

This is the simplest metric and it measures the fraction of highly reactive bases (bases whose reactivity exceeds a user-defined threshold t) that are unpaired in the secondary structure:

C = \sum_{i = 1}^{k} \{\begin{cases} 1 & i f i \in u \\ 0 & i f i \in p \end{cases}

where k is the set of bases having reactivity > t, while u and p are respectively the sets of unpaired and paired bases in the structure.

[2] Data-Structure Correlation Index (DSCI)

This metric was originally proposed by Lan et al., 2021 (doi: 10.1101/2020.06.29.178343) and it is closely related to the Mann-Whitney U statistic. The DSCI is defined as the probability that a randomly chosen unpaired base will have greater reactivity than a randomly chosen paired base:

D C S I = \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} \{\begin{cases} 1 & i f p_{i} < u_{j} \\ 0 & i f p_{i} \geq u_{j} \end{cases}

where p is the set of reactivities for all m paired bases, while u is the set of reactivities for all n unpaired bases.

[3] Area Under the Receiver Operating Characteristic Curve (AUROC)

This metric is typically employed to assess the performance of a binary classifier model at varying threshold values.
Briefly, the reactivity threshold t is slowly increased from 0 to 1, in 0.005 increments. At each threshold, the True Positive Rate (TPR) is calculated as:

T P R = \frac{T P}{P}

where TP is the number of unpaired bases whose reactivity ≥ t, and P is the total number of unpaired bases in the structure.
The True Negative Rate (TNR) is instead calculated as:

T N R = \frac{T N}{N}

where TN is the number of paired bases whose reactivity ≥ t, and N is the total number of paired bases in the structure.

The AUROC is then defined as the area underlying the curve described by the set of FPR-TPR value pairs at each value of t.

AUROC

Usage

To list the required parameters, simply type:

$ rf-eval -h

Parameter	Type	Description
-s or --structures	string	Path to a (folder of) structure file(s)
-r or --reactivities	string	Path to a (folder of) XML reactivity file(s)
-o or --output	string	Output file with metrics per transcript (Default: rf_eval.txt)
-ow or --overwrite		Overwrites output file (if the specified file already exists)
-p or --processors	int	Number of processors to use (≥1, Default: 1)
-tu or --terminal-as-unpaired		Treats terminal base-pairs as if they were unpaired Note: this parameter and `-it` are mutually exclusive
-it or --ignore_terminal		Terminal base-pairs are excluded from calculations Note: this parameter and `-tu` are mutually exclusive
-kl or --keep-lonelypairs		Lonely base-pairs (helices of 1 bp) are retained
-kp or --keep-pseudoknots		Pseudoknotted base-pairs are retained
-c or --reactivity-cutoff		Cutoff for considering a base highly-reactive when computing the unpaired coefficient (>0, Default: 0.7)