The RF Index tool is designed to automatically generate a Bowtie reference index, that will be used by the RF Map module for reads mapping.
This tool requires an internet connection, since it relies on querying the UCSC Genome database to obtain transcripts annotation and reference genome’s sequence. Alternatively, RF Index can be used to retrieve prebuilt indexes from RNAFramework.com.
Usage
To list the required parameters, simply type:
$ rf-index -h
Parameter | Type | Description |
---|---|---|
-b2 or --bowtie2 | Generates/retrieves a Bowtie v2 index (Default: Bowtie v1) | |
-p or --processors | int | Number of processors to use (Default: 1) |
-o or --output-dir | string | Bowtie index output directory (Default: automatically defined in index retrieval mode, <assembly>_<annotation>_<bowtie version> in index building mode) |
-ow or --overwrite | Overwrites the output directory if already exists | |
Prebuilt indexes retrieval mode | ||
-lp or --list-prebuilt | Lists available RNA Framework prebuilt reference indexes | |
-pb or --prebuilt | int | Retrieves the prebuilt reference index with the given ID (>=1, Default: none) Note: to obtain a list of available prebuild indexes, use -lp (or --list-prebuilt ) |
Reference building mode | ||
-H or --host | string | UCSC server hostname (Default: genome-mysql.cse.ucsc.edu) |
-P or --port | int | UCSC server port (Default: 3306) |
-g or --genome-assembly | string | Genome assembly for the species of interest (Default: mm9). For a complete list of UCSC available assemblies, please refer to the UCSC website (https://genome.ucsc.edu/FAQ/FAQreleases.html) |
-la or --list-annotations | Lists available gene annotation UCSC tables | |
-a or --annotation | string | Name of the UCSC table containing the genes annotation (Default: refFlat). Note: For a complete list of tables available for the chosen assembly, please either use -la (or --list-annotations ), or refer to the UCSC website (https://genome.ucsc.edu/cgi-bin/hgTables) |
-n or --gene-name | If available, gene name/symbol will be used (see UCSC tables' "name2"/"geneName" columns) | |
-co or --coding-only | Builds reference index using only protein-coding transcripts | |
-no or --noncoding-only | Builds reference index using only non-coding transcripts | |
-u or --unspliced | Builds reference index using pre-mRNA sequences (including introns) | |
-t or --timeout | int | Connection’s timeout in seconds (Default: 180) |
-r or --reference | string | Path to a FASTA file containing chromosome (or scaffold) sequences for the chosen genome assembly. Note: if no file is specified, RF Index will try to obtain sequences from the UCSC DAS server. This process may take up to hours, depending on your connection's speed. |
-b or --bowtie-build | string | Path to bowtie-build (or bowtie2-build ) executable (Default: assumes bowtie-build /bowtie2-build is in PATH) |
-e or --bedtools | string | Path to bedtools executable (Default: assumes bedtools is in PATH) |
Note
For experiments conducted over synthetic RNAs (or custom RNA pools), a reference can be generated by directly invoking the bowtie-build
command on a FASTA file lexicographically sorted by sequence ID.
To prepare a custom Bowtie index, simply do:
# Sort reference FASTA file by sequence ID
$ awk 'BEGIN{RS=">"} NR>1 {gsub("\n", "\t"); print ">"$0}' reference_unsorted.fa | \
LC_ALL=C sort -t ' ' -k 2,2 | \
awk '{sub("\t", "\n"); gsub("\t", ""); print $0}' > reference_sorted.fa
# Build a Bowtie index
$ bowtie-build reference_sorted.fa reference_sorted
# Alternatively, build a Bowtie v2 index
$ bowtie2-build reference_sorted.fa reference_sorted
$ ls -l
-rwxrwxrwx 1 danny danny 96041105 5 mar 10.50 reference_sorted.1.ebwt
-rwxrwxrwx 1 danny danny 37313744 5 mar 10.50 reference_sorted.2.ebwt
-rwxrwxrwx 1 danny danny 1844468 5 mar 10.28 reference_sorted.3.ebwt
-rwxrwxrwx 1 danny danny 74627475 5 mar 10.28 reference_sorted.4.ebwt
-rwxrwxrwx 1 danny danny 302198817 5 mar 10.28 reference_sorted.fa
-rwxrwxrwx 1 danny danny 96041105 5 mar 11.11 reference_sorted.rev.1.ebwt
-rwxrwxrwx 1 danny danny 37313744 5 mar 11.11 reference_sorted.rev.2.ebwt
-rwxrwxrwx 1 danny danny 302198817 5 mar 10.28 reference_unsorted.fa