RF Correlate allows calculating pairwise correlations of structure probing experiments. It can be invoked either on individual transcripts or on whole XML folders, as well as RC files.
Overall, as well as per-transcript correlations are reported in CSV format.
Note
When directly comparing two XML files, no check is made on the transcript ID, hence allowing the direct correlation of any two XML files (provided that they are of the same length).
Usage
To list the required parameters, simply type:
$ rf-correlate -h
Parameter | Type | Description |
---|---|---|
-p or --processors | int | Number of processors to use (Default: 1) |
-o or --output | string | Output CSV file (Default: rf_correlate.csv) |
-ow or --overwrite | Overwrites output file (if the specified file already exists) | |
-m or --min-values | float | Minimum number of values to calculate correlation (Default: off) Note: if a value between 0 and 1 is provided, this is interpreted as a fraction of the transcript's length |
-s or --skip-overall | Skips overall experiment correlation calculation (faster) | |
-i or --ingore-sequence | Ignores sequence differences (e.g. SNVs) between the compared transcripts | |
-S or --spearman | Uses Spearman instead of Pearson to calculate correlation | |
RC file-specific options | ||
--kb or --keep-bases | string | Bases on which correlation should be calculated (Default: all) Note: this option has effect only on RC files. For XML files, reactive bases are automatically identified from the reactive attribute |
-mc or --min-coverage | int | Restricts the correlation analysis to bases exceeding this coverage |
-c or --coverage | Correlation is calculated on the coverage, rather than on the raw RT stop/mutation counts | |
-r or --ratio | Correlation is calculated on the ratio between, the RT stop/mutation counts and the coverage, rather than on the raw RT stop/mutation counts |
Note
When --min-values
specified value is interpreted as a fraction of the transcript's length, only reactive bases (specified by the reactive
attribute for XML files, or via the --keep-bases
parameter for RC files) are considered. For example, if a transcript containing 25% of each base has been modified with DMS (than only modifies A/C residues), setting --min-values
to 0.5 will cause RF Correlate to skip the transcript if more than 50% of the A/C residues are NaNs (or do not exceed the --min-coverage
threshold for RC files).