RF Correlate allows calculating pairwise correlations of structure probing experiments. It can be invoked either on individual transcripts or on whole XML folders, as well as RC files.
Overall, as well as per-transcript correlations are reported in CSV format.

Note

When directly comparing two XML files, no check is made on the transcript ID, hence allowing the direct correlation of any two XML files (provided that they are of the same length).


Usage

To list the required parameters, simply type:

$ rf-correlate -h
Parameter Type Description
-p or --processors int Number of processors to use (Default: 1)
-o or --output string Output CSV file (Default: rf_correlate.csv)
-ow or --overwrite Overwrites output file (if the specified file already exists)
-m or --min-values float Minimum number of values to calculate correlation (Default: off)
Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length
-s or --skip-overall Skips overall experiment correlation calculation (faster)
-i or --ingore-sequence Ignores sequence differences (e.g. SNVs) between the compared transcripts
-S or --spearman Uses Spearman instead of Pearson to calculate correlation
RC file-specific options
--kb or --keep-bases string Bases on which correlation should be calculated (Default: all)
Note: this option has effect only on RC files. For XML files, reactive bases are automatically identified from the reactive attribute
-mc or --min-coverage int Restricts the correlation analysis to bases exceeding this coverage
-c or --coverage Correlation is calculated on the coverage, rather than on the raw RT stop/mutation counts
-r or --ratio Correlation is calculated on the ratio between, the RT stop/mutation counts and the coverage, rather than on the raw RT stop/mutation counts

Note

When --min-values specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the reactive attribute for XML files, or via the --keep-bases parameter for RC files) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values to 0.5 will cause RF Correlate to skip the transcript if more than 50% of the A/C residues are NaNs (or do not exceed the --min-coverage threshold for RC files).


Output file

RF Correlate generates a TSV file containing 3 fields:

  1. Transcript ID
  2. Correlation coefficient
  3. P-value