RF Correlate allows calculating pairwise correlations of structure probing experiments. It can be invoked either on individual transcripts or on whole XML folders, as well as RC files.
Overall, as well as per-transcript correlations are reported in TSV format.

Since v2.9.6, RF Correlate can calculate correlations between any number of experiments (many vs. many), and it can handle genome-level RC files (such as those generated by rf-count-genome).

Note

When directly comparing two XML files, no check is made on the transcript ID, hence allowing the direct correlation of any two XML files (provided that they share the same sequence).


Usage

$ rf-correlate [options] sample1.rc sample2.rc .. sampleN.rc
$ rf-correlate [options] rna1.xml rna2.xml .. rnaN.xml
$ rf-correlate [options] XML_dir_1/ XML_dir_2/ .. XML_dir_N/

To list all available parameters, simply type:

$ rf-correlate -h
Parameter Type Description
-p or --processors int Number of processors to use (Default: 1)
-o or --output string Output folder (Default: rf_correlate/)
-ow or --overwrite Overwrites output folder (if the specified folder already exists)
-m or --min-values float Minimum number of values to calculate correlation (Default: off)
Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length
-cr or --cap-react float Maximum reactivity value to cap reactivities to (> 0, Default: 1e9)
Note: if processing RC files, this parameter only applies to ratios (-r)
-mr or --max-react float Reactivity values above this threshold will be excluded from correlation calculation (> 0, Default: none)
Note: if processing RC files, this parameter only applies to ratios (-r)
-I or --ingore-sequence Ignores sequence differences (e.g. SNVs) between the compared transcripts
-S or --spearman Uses Spearman instead of Pearson to calculate correlation
RC file-specific options
-i or --index string An RCI index file to be used for all input RC files
Note: If no RCI index is provided, RF Correlate will look for files with .rci extension in the same input folder as the RC files, named after the RC files (e.g., Sample.rc will look for Sample.rc.rci). If no RCI file is found, it will be created at runtime, and stored in the same folder of the input RC files.
--kb or --keep-bases string Bases on which correlation should be calculated (Default: all)
Note: this option has effect only on RC files. For XML files, reactive bases are automatically identified from the reactive attribute
-mc or --min-coverage int Restricts the correlation analysis to bases exceeding this coverage
-c or --coverage Correlation is calculated on the coverage, rather than on the raw RT stop/mutation counts
-r or --ratio Correlation is calculated on the ratio between, the RT stop/mutation counts and the coverage, rather than on the raw RT stop/mutation counts
-bs or --block-size int Defines the size of the memory block (in bp) to process RC files containing whole chromosome data (such as those generated by rf-count-genome) (≥1, Default: 100000)

Note

When --min-values specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the reactive attribute for XML files, or via the --keep-bases parameter for RC files) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values to 0.5 will cause RF Correlate to skip the transcript if more than 50% of the A/C residues are NaNs (or do not exceed the --min-coverage threshold for RC files).


Sample labeling

By default, RF Correlate uses the input file names (stripped of their extension) as labels in the output files.

It is however possible to specify custom labels by prepending them to the input files, in the form label::

$ rf-correlate [options] Sample_1:XML_dir_1/ Sample_2:XML_dir_2/ .. Sample_N:XML_dir_N/


Output files

RF Correlate generates an output folder containing:

  1. matrix.csv: a CSV matrix of the pairwise correlation coefficients between all samples
  2. pairwise/: a folder containing a TSV file for each pairwise comparison, with a list of correlation coefficients (and corresponding p-values) for each transcript (or genomic block)
  3. heatmap.pdf (optional, requires -g): a clustered heatmap of the pairwise correlation coefficients between all samples


RF Correlate Heatmap