The RF StructExtract module allows extracting (portions of) individual structure elements from a structure model generated using rffold
, on the basis of specific selection criteria such as size, median reactivity, median Shannon entropy, presence of a multiway junction, or thermodynamic stability higher than expected by chance.
Usage
To list the required parameters, simply type:
$ rfstructextract h
Parameter  Type  Description 

p or processors  int  Number of processors (threads) to use (Default: 1) 
ro or rffoldOut  string  Path to the output folder generated by rffold , containing the structures to be parsed 
xf or xmlFolder  string  Path to the output folder generated by rfnorm , containing the reactivities in XML format 
o or output  string  Output folder (Default: rf_structextract/) 
ow or overwrite  Overwrites the output directory if already exists  
w or winSize  int  Window size (in nt) for calculating the median reactivity and Shannon (Default: 50) 
ml or minTranscriptLen  int  Low reactivity  low Shannon calculation will be skipped for transcripts below this length (Default: 500) 
ir or ignoreReact  Skips low reactivity evaluation  
is or ignoreShannon  Skips low Shannon evaluation  
mv or minValueFrac  float  Windows for which less than this fraction of bases is covered, will be set to NaN (Default: 0.4 [40%]) 
mb or minBelowMedian  float  Structure elements having less than this fraction of bases whose Shannon and reactivity are below the global transcript median, will be discarded (Default: 0.7 [70%]) 
mp or minPairedFrac  float  Structure elements having less than this fraction of paired bases will be discarded (Default: 0.45 [45%]) 
mm or minMotifLen  int  Structure elements below this length will be discarded (Default: 50) 
xm or maxMotifLen  int  Structure elements above this length will be discarded (Default: no limit) 
xl or maxLoopSize  int  Structure elements encompassing a loop larger than this number of bases, will be discarded (Default: no limit) 
mo or multiwayOnly  Only report structure elements encompassing multiway junctions  
opf or onePerFile  Extracted structure elements belonging to the same transcript are reported in separate files  
ee or evalEnergy  Only structure having a free energy significantly lower than expected by chance will be reported Note #1: this is estimated by randomly shuffling the underlying sequence N times (where N is controlled via the nShufflings parameter) and by calculating the probability associated with the corresponding ZscoreNote #2: this procedure will significantly slow down the analysis 

v or pvalue  float  Pvalue threshold for considering the energy of a structure significantly lower than expected by chance (01, Default: 0.05) 
ns or nShufflings  int  Number of times a sequence must be shuffled (>=1, Default: 100) 
ds or dinuclShuffle  Sequences are shuffled taking care to preserve their dinucleotide frequencies (slower) 
Understanding the algorithm
Aim of the module is to extract highconfidence RNA structure elements, more likely to be functionally relevant. The algorithm first identifies independently folded structural domains (that are regions of the transcript whose folding is independent from that of the rest of the transcript) and then, starting from the innermost loop, it begins identifying the individual structure motifs by bidirectional extension. The extension is stopped when one or more of the following userdefined criteria are not met:

If the motif falls within a region of high Shannon entropy  high reactivity. Briefly, the smoothed Shannon entropy is calculated along the entire transcript by sliding a centered window of ±
winSize
/ 2 nucleotides in 1 nt increments and by calculating the median Shannon entropy within each window. If the fraction of bases in the window having nonNaN values is <minValueFrac
, the window median is set to NaN. The same operation is repeated to calculate a smoothed reactivity. When a structure motif is extacted, the algorithm compares the smoothed Shannon entropy and reactivity across all the bases encompassed by the motif, to the median Shannon entropy and reactivity of the entire transcript. The motif is retained and the extension continued if the fraction of bases falling below the transcript's median is ≤minBelowMedian
.
It is essential to note that, in order to evaluate the Shannon entropy, therffoldOut
directory passed to the module must contain theshannon/
folder. This folder is only generated when invoking therffold
module with theshannonentropy
flag (more details can be found in the manual page ofrffold
). Evaluation of the Shannon entropy can be turned off by enabling theignoreShannon
flag. Similarly, in order to evaluate the reactivity, thexmlFolder
of XML reactivity profiles must be provided. Evaluation of the reactivity can be turned off by enabling theignoreReact
flag. Both Shannon entropy and reactivity evaluation are automatically skipped for transcripts shorter thanminTranscriptLen
. 
If the fraction of basepaired positions in the motif is <
minPairedFraction
 If the motif is shorter than
minMotifLen
(in which case the extension continues, if possible)  If the motif is longer than
maxMotifLen
 If the motif encompasses a loop larger than
maxLoopSize
(Note: in the case of junctions, the size of the loop is calculated as the number of unpaired residues residing in the junction loop)  If the motif has a free energy higher than expected by chance. Briefly, if the flag
evalEnergy
is enabled, the sequence of the motif is randomly shuffled ×nShufflings
times and the probability of obtaining by chance a structure having a free energy ≤ that that of the original motif is calculated from the corresponding Zscore. If the probability is ≥pvalue
, the motif is discarded. When thedinuclShuffle
flag is enabled, the sequence of the motif is shuffled in such a way that the dinucleotide frequencies are preserved.
In the above example, the effect of smoothing reactivities and Shannon entropy is shown. The red dashed lines correspond to the median reactivity and median Shannon entropy along the entire trascript. The inset further shows an independently folded structural domain. The green dots mark the loops that represent the possible starting points for the bidirectional extension and motif extraction. In this example, only the two motifs, colored in green (respectively marked #1 and #2) will be reported, as they fall inside regions of low reactivity and low Shannon entropy. The basepairs marked in red will not be part of motif #1 as, when included, the reactivity would exceed the global median reactivity for more than 1  minBelowMedian
bases.