RF Correlate allows calculating pairwise correlations of structure probing experiments. It can be invoked either on individual transcripts or on whole XML folders, as well as RC files.
Overall, as well as per-transcript correlations are reported in TSV format.
Since v2.9.6, RF Correlate can calculate correlations between any number of experiments (many vs. many), and it can handle genome-level RC files (such as those generated by rf-count-genome).
Note
When directly comparing two XML files, no check is made on the transcript ID, hence allowing the direct correlation of any two XML files (provided that they share the same sequence).
Usage
$ rf-correlate [options] sample1.rc sample2.rc .. sampleN.rc
$ rf-correlate [options] rna1.xml rna2.xml .. rnaN.xml
$ rf-correlate [options] XML_dir_1/ XML_dir_2/ .. XML_dir_N/
To list all available parameters, simply type:
$ rf-correlate -h
| Parameter | Type | Description |
|---|---|---|
| -p or --processors | int | Number of processors to use (Default: 1) |
| -o or --output | string | Output folder (Default: rf_correlate/) |
| -ow or --overwrite | Overwrites output folder (if the specified folder already exists) | |
| -m or --min-values | float | Minimum number of values to calculate correlation (Default: off) Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length |
| -cr or --cap-react | float | Maximum reactivity value to cap reactivities to (> 0, Default: 1e9) Note: if processing RC files, this parameter only applies to ratios ( -r) |
| -mr or --max-react | float | Reactivity values above this threshold will be excluded from correlation calculation (> 0, Default: none) Note: if processing RC files, this parameter only applies to ratios ( -r) |
| -I or --ingore-sequence | Ignores sequence differences (e.g. SNVs) between the compared transcripts | |
| -S or --spearman | Uses Spearman instead of Pearson to calculate correlation | |
| RC file-specific options | ||
| -i or --index | string | An RCI index file to be used for all input RC files Note: If no RCI index is provided, RF Correlate will look for files with .rci extension in the same input folder as the RC files, named after the RC files (e.g., Sample.rc will look for Sample.rc.rci). If no RCI file is found, it will be created at runtime, and stored in the same folder of the input RC files. |
| --kb or --keep-bases | string | Bases on which correlation should be calculated (Default: all) Note: this option has effect only on RC files. For XML files, reactive bases are automatically identified from the reactive attribute |
| -mc or --min-coverage | int | Restricts the correlation analysis to bases exceeding this coverage |
| -c or --coverage | Correlation is calculated on the coverage, rather than on the raw RT stop/mutation counts | |
| -r or --ratio | Correlation is calculated on the ratio between, the RT stop/mutation counts and the coverage, rather than on the raw RT stop/mutation counts | |
| -bs or --block-size | int | Defines the size of the memory block (in bp) to process RC files containing whole chromosome data (such as those generated by rf-count-genome) (≥1, Default: 100000) |
Note
When --min-values specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the reactive attribute for XML files, or via the --keep-bases parameter for RC files) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values to 0.5 will cause RF Correlate to skip the transcript if more than 50% of the A/C residues are NaNs (or do not exceed the --min-coverage threshold for RC files).
Sample labeling
By default, RF Correlate uses the input file names (stripped of their extension) as labels in the output files.
It is however possible to specify custom labels by prepending them to the input files, in the form label::
$ rf-correlate [options] Sample_1:XML_dir_1/ Sample_2:XML_dir_2/ .. Sample_N:XML_dir_N/
Output files
RF Correlate generates an output folder containing:
- matrix.csv: a CSV matrix of the pairwise correlation coefficients between all samples
- pairwise/: a folder containing a TSV file for each pairwise comparison, with a list of correlation coefficients (and corresponding p-values) for each transcript (or genomic block)
- heatmap.pdf (optional, requires
-g): a clustered heatmap of the pairwise correlation coefficients between all samples
