RF Correlate allows calculating pairwise correlations of structure probing experiments. It can be invoked either on individual transcripts or on whole XML folders, as well as RC files.
Overall, as well as per-transcript correlations are reported in CSV format.
When directly comparing two XML files, no check is made on the transcript ID, hence allowing the direct correlation of any two XML files (provided that they are of the same length).
To list the required parameters, simply type:
$ rf-correlate -h
|-p or --processors||int||Number of processors to use (Default: 1)|
|-o or --output||string||Output CSV file (Default: rf_correlate.csv)|
|-ow or --overwrite||Overwrites output file (if the specified file already exists)|
|-m or --min-values||float||Minimum number of values to calculate correlation (Default: off)
Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length
|-s or --skip-overall||Skips overall experiment correlation calculation (faster)|
|-i or --ingore-sequence||Ignores sequence differences (e.g. SNVs) between the compared transcripts|
|-S or --spearman||Uses Spearman instead of Pearson to calculate correlation|
|RC file-specific options|
|--kb or --keep-bases||string||Bases on which correlation should be calculated (Default: all)
Note: this option has effect only on RC files. For XML files, reactive bases are automatically identified from the
|-mc or --min-coverage||int||Restricts the correlation analysis to bases exceeding this coverage|
|-c or --coverage||Correlation is calculated on the coverage, rather than on the raw RT stop/mutation counts|
|-r or --ratio||Correlation is calculated on the ratio between, the RT stop/mutation counts and the coverage, rather than on the raw RT stop/mutation counts|
--min-values specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the
reactive attribute for XML files, or via the
--keep-bases parameter for RC files) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting
--min-values to 0.5 will cause RF Correlate to skip the transcript if more than 50% of the A/C residues are NaNs (or do not exceed the
--min-coverage threshold for RC files).