The RF Norm module takes one (Rouskin and Zubradt methods), two (Ding and Siegfried methods), or three (Siegfried method) RC files generated by the RF Count module, and performs normalization to obtain transcriptome-wide per-base reactivities.
Reactivity scores can be computed using 3 methods:

Scoring of RT-stops/nuclease cuts-based methods

[1] Ding et al., 2014 (PMID: 24270811)

Per-base signal is calculated as the ratio of the natural log (ln) of the raw count of RT-stops/nuclease cuts at a given position of a transcript, to the average of the ln of RT-stops/nuclease cuts along the whole transcript:

U_{i} = \frac{\ln (n_{U i} + p)}{(\sum_{j = 0}^{l} \frac{\ln (n_{U j} + p)}{l})}

$T_{i} = \frac{\ln (n_{T i} + p)}{(\sum_{j = 0}^{l} \frac{\ln (n_{T j} + p)}{l})}$
where n_Ui and n_Ti are respectively the raw read counts in the untreated (or RNase V1) and treated (DMS, CMCT, SHAPE, or Nuclease S1) samples at position i of the transcript, l is the transcript’s length, and p is a pseudocount added to deal with non-covered regions. U_i and T_i are respectively the normalized number of RT-stops/nuclease cuts at position i in the untreated and treated samples.
Score at position i is then calculated as:

$S_{i} = m a x (0, (T_{i} - U_{i}))$
[2] Rouskin et al., 2014 (PMID: 24336214)

The untreated sample is not considered. Per-base RT-stops/nuclease cuts are used as a direct measure of the raw signal.

Warning

Normalization of data processed by Rouskin method, can only be performed using the 90% Winsorizing approach.

Scoring of mutational profiling-based methods

[3] Siegfried et al., 2014 (PMID: 25028896)

This method takes into account both an untreated sample, and (optionally) a denatured control sample.
Per-base raw signal is calculated as:

S_{i} = \frac{\frac{n_{T i}}{c_{T i}} - \frac{n_{U i}}{c_{U i}}}{\frac{n_{D i}}{c_{D i}}}

where n_Ti, n_Ui, and n_Di are respectively the mutation counts in the treated, untreated, and denatured samples at position i of the transcript, while c_Ti, c_Ui, and c_Di are respectively the reads covering position i of the transcript in the treated, untreated, and denatured samples.
If no denatured control sample is provided, raw reactivities are simply calculated as:

$S_{i} = \frac{n_{T i}}{c_{T i}} - \frac{n_{U i}}{c_{U i}}$

[4] Zubradt et al., 2016 (PMID: 27819661)

The untreated sample is not considered. Per-base raw signal is calculated as:

$S_{i} = \frac{n_{T i}}{c_{T i}}$
where n_Ti, and c_Ti are respectively the mutations count and the read coverage at position i of the transcript.

Normalization of raw reactivities

Raw reactivity scores can be normalized using 3 different approaches:

Method	Description
2-8% Normalization	From the top 10% of values, the top 2% is ignored, then any reactivity value along the entire transcript is divided by the average of the remaining 8%
90% Winsorizing	Each reactivity value above the 95^th percentile is set to the 95^th percentile and each reactivity value below the 5^th percentile is set to the 5^th percentile, then the reactivity at each position of the transcript is divided by the value of the 95^th percentile
Box-plot Normalization	Values greater than 1.5x the interquartile range (numerical distance between the 25^th and 75^th percentiles) above the 75^th percentile are removed. After excluding these outliers, the next 10% of remaining reactivities are averaged, and all reactivities (including outliers) are divided by this value.

Normalized reactivities can be further remapped to values ranging from 0 to 1 according to Zarringhalam et al., 2012 (PMID: 23091593). In this approach, values < 0.25 are linearly mapped to [0-0.35[, values ≥ 0.25 and < 0.3 are linearly mapped to [0.35-0.55[, values ≥ 0.3 and < 0.7 are linearly mapped to [0.55-0.85[, and values ≥ 0.7 are linearly mapped to [0.85-1].

Sliding-window normalization

RF Norm supports data normalization in sliding windows. Windows can be both static (default) or dynamic. When a window size is chosen, data normalization is performed by sliding by the chosen offset, a window of that size. While choice of window's type is irrelevant with SHAPE data, it becomes particularly important when dealing with base-specific reagents. Let consider the example below, in which an RNA has been modified by DMS, which only reacts with A and C residues.

Static vs. Dynamic windows

In the above example, use of static windows of size 10 nt results in an erroneous overestimation of base reactivities for certain residues (marked in red). This is caused by the fact that A/C residues are unevenly distributed along the transcript, thus causing certain windows to have far less than 50% of A/C bases (contrary to what it would be expected by chance). Instead, use of dynamic windows of size 10 nt avoids this overestimation, as the window's size is dynamically adjusted to always include 10 A/C residues.
The overestimation effect can also be minimized by increasing the size of static windows.

Usage

To list the required parameters, simply type:

$ rf-norm -h

Parameter	Type	Description
-u or --untreated	string	Path to the RC file for the non-treated/denatured (or RNase V1) sample (required by Ding/Siegfried scoring methods)
-t or --treated	string	Path to the RC file for the treated (or Nuclease S1) sample
-d or --denatured	string	Path to the RC file for the denatured sample (optional for Siegfried scoring method)
-i or --index	string[,string]	A comma separated (no spaces) list of RCI index files for the provided RC files Note #1: RCI files must be provided in the order 1. Untreated/Denatured, 2. Treated Note #2: If a single RTI file is specified, it will be used for all RC files Note #3: If no RCI index is provided, it will be generated at runtime, and stored in the same folder of the untreated/denatured/treated samples
-p or --processors	int	Number of processors (threads) to use (Default: 1)
-o or --output-dir	string	Output directory for writing normalized data in XML format (Default: <treated>_vs_<untreated>_norm/ for Ding method or Siegfried method without denatured sample, <treated>_norm/ for Rouskin/Zubradt methods, <treated>_vs_<untreated>_<denatured>_norm/ for Siegfried method with denatured sample)
-ow or --overwrite		Overwrites the output directory if already exists
-c or --config-file	string	Path to a configuration file with normalization parameters (see Configuration files paragraph) Note #1: If the provided file exists, the loaded configuration will override any command-line specified parameter Note #2: If the provided file doesn’t exist, it will be generated using the specified command-line (or default) parameters
-sm or --scoring-method	int	Method for score calculation (1-4, Default: 1): 1. Ding et al., 2014 2. Rouskin et al., 2014 3. Siegfried et al., 2014 4. Zubradt et al., 2016
-nm or --norm-method	int	Method for signal normalization (1-3, Default: 1): 1. 2-8% Normalization 2. 90% Winsorizing 3. Box-plot Normalization
-r or --raw		Reports raw reactivities (skips data normalization)
-rm or --remap-reactivities		Remaps normalized reactivities to values ranging from 0 to 1 according to Zarringhalam et al., 2012
-rb or --reactive-bases	string	Reactive bases to consider for signal normalization (Default: N [ACGT]) Note: This parameter accepts any IUPAC code, or their combination (e.g. `-rb M`, or `-rb AC`). Any other base will be reported as NaN
-ni or --norm-independent		Each one of the reactive bases will be normalized independently (e.g. -rb AC -ni will independently normalize A and C residues)
-dw or --dynamic-window		When enabled, the normalization window is dynamically resized to include at least that number of reactive bases (e.g. `-rb AC -nw 50 -dw` instructs RF Norm to normalize reactivities in windows containing at least 50 A/C residues)
-nf or --norm-factor	float[,float]	When provided, this will be used as the normalization factor for all transcripts (default behavior is to calculate the normalization factor independently for each transcript) Note: 90% Winsorizing requires 2 normalization factors, provided as a comma-separated list, respectively corresponding to the 5th and 95th percentiles of the distribution of raw reactivities
-mc or --mean-coverage	float	Discards any transcript with mean coverage below this threshold (≥0, Default: 0)
-ec or --median-coverage	float	Discards any transcript with median coverage below this threshold (≥0, Default: 0)
-nw or --norm-window	int	Window size (in nt) for signal normalization (≥3, Default: whole transcript [Ding; Siegfried], 50 [Rouskin; Zubradt]) Note: a maximum window size of 30,000 nt is allowed when `-dw` (or `--dynamic-window`) is enabled
-wo or --window-offset	int	Offset (in nt) for window sliding during normalization (Default: none [Ding; Siegfried], 50 [Rouskin; Zubradt])
-D or --decimals	int	Number of decimals for reporting reactivities (1-10, Default: 3)
-n or --nan	int	Positions of transcript with read coverage behind this threshold, will be reported as NaN in the reactivity profile (>0, Default: 10)
		*Scoring method #1 options (Ding et al., 2014)*
-pc or --pseudocount	float	Pseudocount added to reactivities to avoid division by 0 (>0, Default: 1)
-s or --max-score	float	Score threshold for capping raw reactivities (>0, Default: 10)
		*Scoring method #3 options (Siegfried et al., 2014)*
-mu or --max-untreated-mut	float	Maximum per-base mutation rate in untreated sample (≤1, Default: 0.05 [5%])
		Scoring methods #1 and #3 options (Ding et al., 2014 & Siegfried et al., 2014)
-il or --ignore-lower-than-untreated		Bases having raw reactivity in the treated sample lower than the untreated control, will be ignored (not used during reactivity normalization) and will be reported as NaNs
		Scoring methods #3 and #4 options (mutational profiling)
-mm or --max-mutation-rate	float	Maximum per-base mutation rate (≤1, Default: 1 [100%])

Configuration files

RF Norm configuration files are used to provide normalization parameters for the analysis, without the need to manually specify them from the command-line.
Configuration files are composed of a list of key/value pairs, separated by the equal sign (=), or by the colon punctuation mark (:). Keys and values are case-insensitive.
Accepted key/value pairs are:

Parameters	Accepted values	Default value
scoreMethod	"Ding" (or 1); "Rouskin" (or 2); "Siegfried" (or 3); "Zubradt" (or 4)	Ding
normMethod	"2-8%" (or 1); "90% Winsorizing" (or 2); "Box-plot" (or 3)	2-8%
reactiveBases	[ACGTURYSWKMBDHVN] (or "all")	all
normIndependent	TRUE/FALSE; Yes/No; 1/0	FALSE
normWindow	Positive integer ≥ 3	1e9 [Ding; Siegfried] 50 [Rouskin; Zubradt]
windowOffset	Positive integer > 0	1e9 [Ding; Siegfried] 50 [Rouskin; Zubradt]
meanCoverage	Positive float ≥ 0	0
medianCoverage	Positive float ≥ 0	0
remapReactivities	TRUE/FALSE; Yes/No; 1/0	FALSE
	Scoring method #1 options
maxScore	Positive float > 0	10
pseudoCount	Positive float > 0	1
	Scoring method #3 options
maxUntreatedMut	0 ≤ r ≤ 1	0.05
maxMutationRate	0 ≤ r ≤ 1	0.2

# A sample configuration file

scoreMethod=Ding
normMethod=2-8%
maxScore=10
pseudoCount=1
reactiveBases=N
normIndependent=FALSE
normWindow=1e9
windowOffset=1e9
meanCoverage=1

Output XML files

RF Norm produces a XML file for each transcript being analyzed, with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<data [attributes]>
    <transcript id=”Transcript ID” length=”Transcript length”>
        <sequence>
            Transcript sequence
        </sequence>
        <reactivity>
            Comma-separated list of reactivity values
        </reactivity>
    </transcript>
</data>

The data tag’s attributes allow keeping track of the analysis performed:

Attribute	Possible values	Description
tool	rf-norm	The tool that generated this XML file
scoring	Ding, Rouskin, Siegfried, or Zubradt	Scoring method
norm	2-8%, Winsorizing 90%, or Box-plot	Normalization method
reactive	[ACGT]	Reactive bases
win	Positive integer ≥ 3	Normalization window's size (in nt)
offset	Positive integer ≥ 1	Offset for normalization window sliding
remap	TRUE/FALSE	Whether normalized reactivities have been remapped according to Zarringhalam et al., 2012
		*Scoring method #1 (Ding et al., 2014)*
max	Positive float > 0	Score threshold for capping raw reactivities
pseudo	Positive float > 0	Pseudocount added to avoid division by 0 during reactivity calculation
		*Scoring method #3 (Siegfried et al., 2014)*
maxumut	0 ≤ r ≤ 1	Maximum per-base mutation rate in untreated sample
maxmutrate	0 ≤ r ≤ 1	Maximum per-base mutation rate
		*Scoring method #4 (Zubradt et al., 2017)*
maxmutrate	0 ≤ r ≤ 1	Maximum per-base mutation rate