RF Combine allows combining XML files from multiple experiments into a single profile.
For example, this can be useful when performing CIRS-seq experiments, to combine into a single profile both the reactivity of A/C residues probed with DMS, and of G/U residues probed with CMCT.
Alternatively, RF Combine is able to combine into a single profile multiple replicates of the same probing experiment. In these cases, the resulting XML files may contain optional “-error” tags, in which the per-base standard deviation of the measure from each experiment is reported.
When combining datasets containing NaN values, only non-NaN positions in all experiments will be combined, while the others will be reported as NaNs as well.
There is no limit to the number of experiments that RF Combine can handle. It can be used both on individual XML files, or on whole XML folders generated by either RF Norm or RF ModCall.
RF Combine further allows retaining only transcripts whose Pearson correlation coefficient exceeds a user-defined threshold.
Important
RF Combine does not allow combining RF Norm XML files generated using different scoring/normalization methods, since this will produce inconsistent data.
Note
In XML files generated using RF Combine, the combined
attribute of the transcript
tag is set to TRUE
.
Usage
To list the required parameters, simply type:
$ rf-combine -h
Parameter | Type | Description |
---|---|---|
-p or --processors | int | Number of processors (threads) to use (Default: 1) |
-o or --output-dir | string | Output directory for writing combined data in XML format (Default: combined/) |
-ow or --overwrite | Overwrites the output directory if already exists | |
-s or --stdev | When combining multiple replicates, an optional "-error" tag will be reported in the output XML files, containing the per-base standard deviation of the measure | |
-d or --decimals | int | Number of decimals for reporting reactivities (1-10, Default: 3) |
-m or --min-values | float | Minimum number of values to calculate correlation (Default: off) Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length |
-c or --min-correlation | float | Minimum correlation to report a combined profile (-1<r<1, Default: off) Note: if more than two replicates are being combined, RF Combine requires this threshold to be satisfied all pairwise comparisons |
-S or --spearman | Uses Spearman instead of Pearson to calculate correlation | |
-l or --log-transform | Log transforms reactivity values before averaging |
Note
When --min-values
specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the XML reactive
attribute; for additional details, please refer to the RF Norm documentation) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values
to 0.5 will cause RF Combine to skip the transcript if more than 50% of the A/C residues are NaNs.