The RF RCtools module enables easy visualization/manipulation of RC files. It allows indexing, merging and dumping RC files.
This tool is particularly useful when the same sample is sequenced more than one time to increase its coverage. Now, instead of merging the BAM files and re-calling the rf-count
on the whole dataset (that is very time-consuming), each sample can be processed independently and simply merged to the RC file from the previous analysis.
Usage
Available tools are: index, view, merge, extract and stats.
Tool | Description |
---|---|
view | Dumps to screen the content of the provided RC file |
merge | Combines multiple RC files |
extract | Generates a new RC file, by extracting the regions specified in a BED or GTF annotation |
index | Generates RCI index |
stats | Prints per-transcript and global reads mapping statistics |
To list the required parameters, simply type:
$ rf-rctools [tool] -h
Parameter | Tool | Type | Description |
---|---|---|---|
-t or --tab | view | Switches to tabular output format | |
-o or --output | merge or extract | string | Output RC filename (Default: merge.rc or <annotation>.rc) |
-ow or --overwrite | merge or extract | Overwrites output file (if the specified file already exists) | |
-i or --index | merge | string[,string] | A comma separated (no spaces) list of RCI index files for the provided RC files Note: RCI files must be provided in the same order as RC files. If a single RCI file is specified along with multiple RC files, it will be used for all of them. |
-T or --tmp-dir | merge | string | Temporary directory (Default: /tmp) |
-a or --annotation | extract | string | BED/GTF file containing a list of regions to be extracted (mandatory) |
-f or --GTFfeature | extract | string | If a GTF file is provided, only entries corresponding to this feature type will be extracted (Default: exon) |
-b or --GTFattribute | extract | string | If a GTF file is provided, this attribute will be used as the entry ID in the output RC file (Default: transcript_id) |
RCtools "view" output
By default, the view
command produces an output structured as follows:
Transcript_1
ATGGGCAGCTATGCA...TGGGCATGCTGGATG
0,0,0,3,1,2,5,9,16,26,10,14,21,899,888,1038,112,96,1135,167,1164,139,161,3520,2522,2075,172,2043,185,205
245496,239926,233144,232804,232485,229422,225754,224062,222318,219039,216337,212885,207928,206206,203534,184536,184118,185854,183831,180871,177687,174523,170546,167506,163845,161977,150523,150637,143787,142784,137815
Transcript_2
GAATTCATGCATGCG...AGCTAGCGGGGATAT
0,0,0,1,0,2,5,30,17,17,15,34,46,32,409,48,509,56,480,499,68,715,677,782,74,1016,988,2035,108,158
512,583,702,783,847,1517,1852,2084,2191,4791,10389,15321,16535,17231,17823,18254,19388,22321,22944,25503,27254,28285,36273,41905,50366,50724,71321,73144,77610,77903
Transcript_n
ATTGCTTCCAATGAA...AATATGGAGACTATG
150,2152,161,3557,3109,137,3077,190,157,3105,3923,3047,3199,158,2931,159,3501,149,3938,159,162,159,177,186,5684,281,4734,3800,6114,4736
504075,499650,493631,489064,480388,478484,477320,468301,462674,457668,438438,428879,418411,411484,404875,404148,403917,402996,409478,408878,398653,394306,390252,370852,360041,361397,359538,359530,359542,363686
in which each transcript is reported as a 4-rows entry, with rows ordered as follows:
- Transcript ID
- Transcript sequence
- Number of per-base RT-stops (or mutations)
- Per-base coverage
When the -t
parameter is specified, the output is instead structured as follows:
Transcript_1
A 0 242
G 0 280
C 0 359
G 3 390
...
A 1038 56642
T 112 65943
T 96 66134
A 1135 74888
Transcript_2
T 185 100294
G 205 100831
G 185 101003
A 1458 101124
...
A 2529 101509
A 2984 101819
G 227 103858
A 2937 105307
Transcript_n
C 0 945
G 13 990
A 3 1064
A 5 1893
...
A 3 2333
G 36 2648
C 25 2993
A 30 14274
in which each transcript is reported as a multi-row entry (with the number of rows equal to transcript's length). Each row is made of 3 tab-spaced fields, ordered as follows:
- Base
- Number of RT-stops (or mutations)
- Coverage
Consecutive entries are separated by a newline.
If a comma (or semicolon) separated list of transcript IDs is provided, only those transcripts will be shown in the output (e.g. rf-rctools view -i index.rci input.rc 'Transcript_2'
):
Transcript_2
GAATTCATGCATGCG...AGCTAGCGGGGATAT
0,0,0,1,0,2,5,30,17,17,15,34,46,32,409,48,509,56,480,499,68,715,677,782,74,1016,988,2035,108,158
512,583,702,783,847,1517,1852,2084,2191,4791,10389,15321,16535,17231,17823,18254,19388,22321,22944,25503,27254,28285,36273,41905,50366,50724,71321,73144,77610,77903
Optionally, the view
tool allows specifying one or more transcript IDs (either separated by commas or semicolons) to visualize:
$ rf-rctools view <file.rc> "Transcript_1;Transcript_2,Transcript_n"
Working with RCtools "extract"
Starting from an input RC file, the extract
command generates an output RC file containing only the regions from a user-specified BED or GTF annotation file. The tool can handle any BED file format (BED3 through BED12).
For BED3-formatted files, the extracted feature will be named after its coordinates in the output RC file. For BED3 through BED5-formatted files, the feature will be assumed to come from the plus strand. For GTF files, only features of a user-specified type (controlled via the --GTFfeature
parameter) are extracted. Features sharing the same attribute value (controlled via the --GTFattribute
) are concatenated into a single RC entry. Features sitting on the minus strand are automatically reverse-complemented.
Since RNA Framework version 2.8.0, the rf-count-genome
module has been introduced, that allows handling genome-level RNA structure probing data. When processing directional RNA structure probing experiments, the module generates two RC files, one per genome strand. These two files have the same prefix, but different suffixes (.plus.rc
and .minus.rc
). When extracting features from genome-level RC files, it is sufficient to pass to RCtools the path to the RC files up to the common prefix and the program will take care of extracting the features from the proper RC file:
$ rf-rctools extract -a annotation.bed /path/to/file
# The command above expects to find a /path/to/file.plus.rc and (optionally) a /path/to/file.minus.rc file
If only the .plus.rc
file exists (such as in the case of samples generated using a non-directional library prep strategy), RCtools will extract features for both the plus and minus strands from that file.