The RF Fold module is designed to allow transcriptome-wide reconstruction of RNA structures, starting from XML files generated using the RF Norm tool. This tool can process a single, or an entire directory of XML files, and produces the inferred secondary structures (either in dot-bracket notation, or CT format) and their graphical representation (either in Postscript, or SVG format).
Folding inference can be performed using 2 different algorithms:

1. ViennaRNA
2. RNAstructure

Prediction can be performed either on the whole transcript, or through a windowed approach (see next paragraph).

Windowed folding

The windowed folding approach is inspired by the original method described in Siegfried et al., 2014 (PMID: 25028896). Since version 2.8.0, the underlying logic of the windowed approach has been slightly changed, by performing the detection of pseudoknots as the last step. The procedure is outlined below:

RNAFramework pipeline

In step I, a window is slid along the RNA, and partition function is calculated. If provided, soft-constraints from structure probing are applied. Predicted base-pair probabilities are averaged across all windows in which they have appeared, and base-pairs with >99% probability are retained, and hard-constrained to be paired in step III.
In step II, a window is slid along the RNA, and MFE folding is performed, including (where present) soft-constraints from probing data, and base-pairs from step I. Predicted base-pairs are retained if they appear in >50% of analyzed windows.
In step III (optional), a window is slid along the RNA, and putative pseudoknots are detected using the same approach employed by the ShapeKnots algorithm (Hajdin et al., 2013 (PMID: 23503844)). Our implementation of the ShapeKnots algorithm relies on the ViennaRNA package (instead of RNAstructure as in the original implementation), thus it is much faster:

ShapeKnots/RNA Framework comparison

Nonetheless, both algorithms work in single thread. Alternatively, the multi-thread implementation ShapeKnots-smp shipped with the latest RNAstructure version can be used.
If constraints from structure probing experiments are provided, these are incorporated in the form of soft-constraints. Predicted pseudoknotted base-pairs are retained if they apper in >50% of analyzed windows and if they do not clash with the nested base-pairs indentified in step II. In case structure probing constraints are provided, pseudoknots are retained only if the average reactivity of bases on both sides of the pseudoknotted helix is below a certain reactivity cutoff.

Note

At all stages, increased sampling is performed at the 5'/3'-ends to avoid end biases

Along with the predicted structure, the windowed method also produces a WIGGLE track file containing per-base Shannon entropies.
Regions with higher Shannon entropies are likely to form alternative structures, while those with low Shannon entropies correspond to regions with well-defined RNA structures, or persistent single-strandedness (Siegfried et al., 2014).
Shannon entropy is calculated as:

H_{i} = - \sum_{j = 1}^{J} p_{i,j} \log_{10} p_{i,j}

where p_i,j is the probability of base i of being base-paired to base j, over all its potential J pairing partners.
Since version 2.5, RF Fold generates vector graphical reports (SVG format) for each structure, reporting the per-base reactivity, the MEA structure, the per-base Shannon entropy, and the base-pairing probabilities:

Note

The calculation of Shannon entropy and base-pairing probabilities requires partition function to be computed. Since this is a very slow step, partition function folding is performed only in windowed mode, or if parameters -dp (or --dotplot) or -sh (or --shannon) are explicitly specified.

Usage

To list the required parameters, simply type:

$ rf-fold -h

Parameter	Type	Description
-o or --output-dir	string	Output directory for writing inferred structures (Default: rf_fold/)
-ow or --overwrite		Overwrites the output directory if already exists
-ct or --connectivity-table		Writes predicted structures in CT format (Default: Dot-bracket notation)
-m or --folding-method	int	Folding method (1-2, Default: 1): 1. ViennaRNA 2. RNAstructure
-p or --processors	int	Number of processors (threads) to use (Default: 1)
-g or --img		Enables the generation of graphical reports
-t or --temperature	float	Temperature in Celsius degrees (Default: 37.0)
-sl or --slope	float	Sets the slope used with structure probing data restraints (Default: 1.8 [kcal/mol])
-in or --intercept	float	Sets the intercept used with structure probing data restraints (Default: -0.6 [kcal/mol])
-md or --maximum-distance	int	Maximum pairing distance (in nt) between transcript's residues (Default: 0 [no limit])
-nlp or --no-lonelypairs		Disallows lonely base-pairs (1 bp helices) inside predicted structures
-i or --ignore-reactivity		Ignores XML reactivity data when performing folding (MFE unconstrained prediction)
-hc or --hard-constraint		Besides performing soft-constraint folding, allows specifying a reactivity cutoff (specified by `-f`) for hard-constraining a base to be single-stranded
-c or --constraints	string	Path to a directory containing constraint files (in dot-bracket notation), that will be used to enforce specific base-pairs in the structure models
-f or --cutoff	float	Reactivity cutoff for constraining a position as unpaired (>0, Default: 0.7)
-w or --windowed		Enables windowed folding
-pt or --partition	string	Path to RNAstructure `partition` executable (Default: assumes `partition` is in PATH) Note: by default, `partition-smp` will be used (if available)
-pp or --probabilityplot	string	Path to RNAstructure `ProbabilityPlot` executable (Default: assumes `ProbabilityPlot` is in PATH)
-fw or --fold-window	int	Window size (in nt) for performing MFE folding (>=50, Default: 600)
-fo or --fold-offset	int	Offset (in nt) for MFE folding window sliding (Default: 200)
-pw or --partition-window	int	Window size (in nt) for performing partition function (>=50, Default: 600)
-po or --partition-offset	int	Offset (in nt) for partition function window sliding (Default: 200)
-wt or --window-trim	int	Number of bases to trim from both ends of the partition windows to avoid end biases (Default: 100)
-dp or --dotplot		Enables generation of dot-plots of base-pairing probabilities
-sh or --shannon-entropy		Enables generation of a WIGGLE track file with per-base Shannon entropies
-pmr or --plot-median-react		Plots the difference between the transcript's median reactivity and the median reactivity in sliding windows
-pms or --plot-median-shannon		Plots the difference between the transcript's median Shannon entropy and the median Shannon entropy in sliding windows
-pk or --pseudoknots		Enables detection of pseudoknots (computationally intensive)
-ksl or --pseudoknot-slope	float	Sets slope used for pseudoknots prediction (Default: same as `-sl <slope>`)
-kin or --pseudoknot-intercept	float	Sets intercept used for pseudoknots prediction (Default: same as `-in <intercept>`)
-kp1 or --pseudoknot-penality1	float	Pseudoknot penality P1 (Default: 0.35)
-kp2 or --pseudoknot-penality2	float	Pseudoknot penality P2 (Default: 0.65)
-kt or --pseudoknot-tollerance	float	Maximum tollerated deviation of suboptimal structures energy from MFE (>0-1, Default: 0.5 [50%])
-kh or --pseudoknot-helices	int	Number of candidate pseudoknotted helices to evaluate (>0, Default: 100)
-kw or --pseudoknot-window	int	Window size (in nt) for performing pseudoknots detection (>=50, Default: 600)
-ko or --pseudoknot-offset	int	Offset (in nt) for pseudoknots detection window sliding (Default: 200)
-kc or --pseudoknot-cutoff	float	Reactivity cutoff for retaining a pseudoknotted helix (0-1, Default: 0.5)
-km or --pseudoknot-method	int	Algorithm for pseudoknots prediction (1-2, Default: 1): 1. RNA Framework 2. ShapeKnots Note: the chosen folding method (specified by `-m`) affects the algorithm used by RNA Framework (pseudoknot detection method #1) to define the initial MFE structure
		RNA Framework pseudoknots detection algorithm options
-vrs or --vienna-rnasubopt	string	Path to ViennaRNA `RNAsubopt` executable (Default: assumes `RNAsubopt` is in PATH)
-ks or --pseudoknot-suboptimal	int	Number of suboptimal structures to evaluate for pseudoknots prediction (>0, Default: 1000)
-nz or --no-zuker		Disables the inclusion of Zuker suboptimal structures (reduces the sampled folding space)
-zs or --zuker-suboptimal		Number of Zuker suboptimal structures to include (>0, Default: 1000)
		ShapeKnots pseudoknots detection algorithm options
-sk or --shapeknots	string	Path to `ShapeKnots` executable (Default: assumes `ShapeKnots` is in PATH) Note: by default, `ShapeKnots-smp` will be used (if available)
		Folding method #1 options (ViennaRNA)
-vrf or --vienna-rnafold	string	Path to ViennaRNA `RNAfold` executable (Default: assumes `RNAfold` is in PATH)
-ngu or --no-closing-gu		Disallows G:U wobbles at the end of helices
-cm or --constraint-method	int	Method for converting provided reactivities into pseudo-energies (1-2, Default: 1): 1. Deigan et al., 2009 2. Zarringhalam et al., 2012
		*Zarringhalam et al., 2012 method options*
-cc or --constraint-conversion	int	Method for converting `rf-norm` reactivities into pairing probabilities (1-5, Default: 1): 1. Skip normalization step (reactivities are treated as pairing probabilities) 2. Linear mapping according to Zarringhalam et al., 2012 3. Use a cutoff to divide nucleotides into paired, and unpaired 4. Linear model for converting reactivities into probabilities of being unpaired 5. Linear model for converting the logarithm of reactivities into probabilities of being unpaired
-bf or --beta-factor	float	Sets the magnitude of penalities for deviations from the observed pairing probabilities (Default: 0.5)
-ms or --model-slope	float	Sets the slope used by the linear model (Default: 0.68 [Method #4], or 1.6 [Method #5]; requires `-cc 4` or `-cc 5`)
-mi or --model-intercept	float	Sets the intercept used by the linear model (Default: 0.2 [Method #4], or -2.29 [Method #5]; requires `-cc 4` or `-cc 5`)
		Folding method #2 options (RNAstructure)
-rs or --rnastructure	string	Path to RNAstructure `Fold` executable (Default: assumes `Fold` is in PATH) Note: by default, `Fold-smp` will be used (if available)
-d or --data-path	string	Path to RNAstructure data tables (Default: assumes DATAPATH environment variable is already set)

Information

For additional details relatively to ViennaRNA soft-constraint prediction methods, please refer to the ViennaRNA documentation, or to Lorenz et al., 2016 (PMID: 26353838).

Information

For additional details relatively to ShapeKnots pseudoknots detection parameters, please refer to Hajdin et al., 2013 (PMID: 23503844).

Constraint files

Constraint files allow forcing base-pairing of certain positions in the RNA. These files are standard dot-bracket files and they must be named after the transcript ID used in the corresponding XML files (for instance, if the XML file is named XYZ.xml, the module will look for a XYZ.db file in the constraint folder):

>XYZ
UUUCGUACGUAGCGAGCGAGUAGCUGAUGCUGAUAGCGGCGAUGCUAGCUGAUCGUAGCGCGCGAUCGAUCGAUGC
..(((.............................................................))).......

In the above example, the constraint file instructs the module to force the base-pairing between positions 3-69, 4-68 and 5-67 of the XYZ transcript.

Information

At present, only nested base-pairs are allowed. Pseudoknotted helices will be automatically discarded.

Output dot-plot files

When option -dp is provided, RF Fold produces a dot-plot file for each transcript being analyzed, with the following structure:

1549                                   # RNA's length
i       j       -log10(Probability)    # Header 
8       254     0.459355416499312
9       253     0.446335563943221
10      252     0.456738523239413
11      251     0.454733421725068
12      250     0.46965667808714
13      249     0.47837140333524
21      35      0.268192200569539
22      34      0.0183400615262171
23      33      0.0166665677814708
24      32      0.0128927546134575
25      31      0.0148601207296645
26      30      0.0252017532628297

-- cut --

1497    1510    0.0147874890078331
1498    1509    0.0102803152157546
1499    1508    0.0137510190884233
1500    1507    0.0402352346970943

where i and j are the positions (1-based) of the bases involved in a given base-pair, followed by the -log₁₀ of their base-pairing probability.
These files can be easily viewed using the Integrative Genomics Viewer (IGV) (for additional details, please refer to the official Broad Institute's IGV page).