The RF RCtools module enables easy visualization/manipulation of RC files..
This tool is particularly useful when the same sample is sequenced more than one time to increase its coverage. Now, instead of merging the BAM files and re-calling the rf-count on the whole dataset (which is very time-consuming), each sample can be processed independently and simply merged to the RC file from the previous analysis.
Usage
Available tools are: index, view, merge, extract and stats.
| Tool | Description |
|---|---|
| view | Dumps to screen the content of the provided RC file |
| merge | Combines multiple RC files |
| extract | Generates a new RC file, by extracting the regions specified in a BED or GTF annotation |
| index | Indexes RC files |
| stats | Prints per-transcript and global reads mapping statistics |
To list the required parameters, simply type:
$ rf-rctools [tool] -h
| Parameter | Tool | Type | Description |
|---|---|---|---|
| -t or --tab | view | Switches to tabular output format | |
| -i or --index | view, merge or extract | string | RCI index file Note: if an RCI index is not specified, the program will look in the same directory of the input RC file for a file named after the RC file with one of the following extensions: .rci, .plus.rc.rci, .minus.rc.rci |
| -o or --output | merge or extract | string | Output RC filename (Default: merge.rc or <annotation>.rc) |
| -ow or --overwrite | merge or extract | Overwrites output file (if the specified file already exists) | |
| -s or __--blockSize___ | merge | int | Maximum size of the chromosome/transcript block to read from each RC file (≥1, Default: 1000000) Note: this is particularly useful when merging genome-level RC files, to prevent entire chromosomes from be kept in memory |
| -a or --annotation | extract | string | BED/GTF file containing a list of regions to be extracted (mandatory) |
| -f or --GTFfeature | extract | string | If a GTF file is provided, only entries corresponding to this feature type will be extracted (Default: exon) |
| -b or --GTFattribute | extract | string | If a GTF file is provided, this attribute will be used as the entry ID in the output RC file (Default: transcript_id) |
RCtools "view" output
By default, the view command produces an output structured as follows:
Transcript_1
ATGGGCAGCTATGCA...TGGGCATGCTGGATG
0,0,0,3,1,2,5,9,16,26,10,14,21,899,888,1038,112,96,1135,167,1164,139,161,3520,2522,2075,172,2043,185,205
245496,239926,233144,232804,232485,229422,225754,224062,222318,219039,216337,212885,207928,206206,203534,184536,184118,185854,183831,180871,177687,174523,170546,167506,163845,161977,150523,150637,143787,142784,137815
Transcript_2
GAATTCATGCATGCG...AGCTAGCGGGGATAT
0,0,0,1,0,2,5,30,17,17,15,34,46,32,409,48,509,56,480,499,68,715,677,782,74,1016,988,2035,108,158
512,583,702,783,847,1517,1852,2084,2191,4791,10389,15321,16535,17231,17823,18254,19388,22321,22944,25503,27254,28285,36273,41905,50366,50724,71321,73144,77610,77903
Transcript_n
ATTGCTTCCAATGAA...AATATGGAGACTATG
150,2152,161,3557,3109,137,3077,190,157,3105,3923,3047,3199,158,2931,159,3501,149,3938,159,162,159,177,186,5684,281,4734,3800,6114,4736
504075,499650,493631,489064,480388,478484,477320,468301,462674,457668,438438,428879,418411,411484,404875,404148,403917,402996,409478,408878,398653,394306,390252,370852,360041,361397,359538,359530,359542,363686
in which each transcript is reported as a 4-rows entry, with rows ordered as follows:
- Transcript ID
- Transcript sequence
- Number of per-base RT-stops (or mutations)
- Per-base coverage
When the -t parameter is specified, the output is instead structured as follows:
Transcript_1
A 0 242
G 0 280
C 0 359
G 3 390
...
A 1038 56642
T 112 65943
T 96 66134
A 1135 74888
Transcript_2
T 185 100294
G 205 100831
G 185 101003
A 1458 101124
...
A 2529 101509
A 2984 101819
G 227 103858
A 2937 105307
Transcript_n
C 0 945
G 13 990
A 3 1064
A 5 1893
...
A 3 2333
G 36 2648
C 25 2993
A 30 14274
in which each transcript is reported as a multi-row entry (with the number of rows equal to transcript's length). Each row is made of 3 tab-spaced fields, ordered as follows:
- Base
- Number of RT-stops (or mutations)
- Coverage
Consecutive entries are separated by a newline.
Optionally, the view tool allows specifying one or more transcript IDs (either separated by spaces) to visualize:
$ rf-rctools view <file.rc> Transcript_1 Transcript_2 Transcript_n
or a specific range of a given transcript (note: numbering is 0-based):
$ rf-rctools view <file.rc> Transcript_1:1000-2000 Transcript_2:5000-6000
When visualizing a specific transcript or transcript region, providing an RCI index (via the -i parameter) will significantly speed-up the retrieval of the region of interest. If no index is specified, the program will also look within the folder of the RC file for a file with .rci extension, named after the RC file itself (e.g., if the RC file is named sample.rc, the module will look for the sample.rc.rci index file).
Working with RCtools "extract"
Starting from an input RC file, the extract command generates an output RC file containing only the regions from a user-specified BED or GTF annotation file. The tool can handle any BED file format (BED3 through BED12).
For BED3-formatted files, the extracted feature will be named after its coordinates in the output RC file. For BED3 through BED5-formatted files, the feature will be assumed to come from the plus strand. For GTF files, only features of a user-specified type (controlled via the --GTFfeature parameter) are extracted. Features sharing the same attribute value (controlled via the --GTFattribute) are concatenated into a single RC entry. Features sitting on the minus strand are automatically reverse-complemented.
Since RNA Framework version 2.8.0, the rf-count-genome module has been introduced, that allows handling genome-level RNA structure probing data. When processing directional RNA structure probing experiments, the module generates two RC files, one per genome strand. These two files have the same prefix, but different suffixes (.plus.rc and .minus.rc). When extracting features from genome-level RC files, it is sufficient to pass to RCtools the path to the RC files up to the common prefix and the program will take care of extracting the features from the proper RC file:
$ rf-rctools extract -a annotation.bed /path/to/file
# The command above expects to find a /path/to/file.plus.rc and (optionally) a /path/to/file.minus.rc file
If only the .plus.rc file exists (such as in the case of samples generated using a non-directional library prep strategy), RCtools will extract features for both the plus and minus strands from that file.