RF Combine allows combining XML files from multiple experiments into a single profile.
For example, this can be useful when performing CIRS-seq experiments, to combine into a single profile both the reactivity of A/C residues probed with DMS, and of G/U residues probed with CMCT.
Alternatively, RF Combine is able to combine into a single profile multiple replicates of the same probing experiment. In these cases, the resulting XML files may contain optional “-error” tags, in which the per-base standard deviation of the measure from each experiment is reported.
When combining datasets containing NaN values, only non-NaN positions in all experiments will be combined, while the others will be reported as NaNs as well.
There is no limit to the number of experiments that RF Combine can handle. It can be used both on individual XML files, or on whole XML folders generated by either RF Norm or RF ModCall.
RF Combine further allows retaining only transcripts whose Pearson correlation coefficient exceeds a user-defined threshold.

Important

RF Combine does not allow combining RF Norm XML files generated using different scoring/normalization methods, since this will produce inconsistent data.

Note

In XML files generated using RF Combine, the combined attribute of the transcript tag is set to TRUE.

Usage

To list the required parameters, simply type:

$ rf-combine -h
Parameter Type Description
-p or --processors int Number of processors (threads) to use (Default: 1)
-o or --output-dir string Output directory for writing combined data in XML format (Default: combined/)
-ow or --overwrite Overwrites the output directory if already exists
-s or --stdev When combining multiple replicates, an optional "-error" tag will be reported in the output XML files, containing the per-base standard deviation of the measure
-d or --decimals int Number of decimals for reporting reactivities (1-10, Default: 3)
-m or --min-values float Minimum number of values to calculate correlation (Default: off)
Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length
-c or --min-correlation float Minimum correlation to report a combined profile (-1<r<1, Default: off)
Note: if more than two replicates are being combined, RF Combine requires this threshold to be satisfied all pairwise comparisons
-S or --spearman Uses Spearman instead of Pearson to calculate correlation
-l or --log-transform Log transforms reactivity values before averaging
-i or --ignore-NaNs NaNs are ignored when calculating mean reactivities
Note: this parameter enables combining XML files from experiments with different sets of reactive bases (e.g., A/C and G/U)

Note

When --min-values specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the XML reactive attribute; for additional details, please refer to the RF Norm documentation) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values to 0.5 will cause RF Combine to skip the transcript if more than 50% of the A/C residues are NaNs.