The RF Index tool is designed to automatically generate a Bowtie reference index, that will be used by the RF Map module for reads mapping.
This tool requires an internet connection, since it relies on querying the UCSC Genome database to obtain transcripts annotation and reference genome’s sequence. Alternatively, RF Index can be used to retrieve prebuilt indexes from RNAFramework.com.

Usage

To list the required parameters, simply type:

$ rf-index -h
Parameter Type Description
-b2 or --bowtie2 Generates/retrieves a Bowtie v2 index (Default: Bowtie v1)
-p or --processors int Number of processors to use (Default: 1)
-o or --output-dir string Bowtie index output directory (Default: automatically defined in index retrieval mode, <assembly>_<annotation>_<bowtie version> in index building mode)
-ow or --overwrite Overwrites the output directory if already exists
Prebuilt indexes retrieval mode
-lp or --list-prebuilt Lists available RNA Framework prebuilt reference indexes
-pb or --prebuilt int Retrieves the prebuilt reference index with the given ID (>=1, Default: none)
Note: to obtain a list of available prebuild indexes, use -lp (or --list-prebuilt)
Reference building mode
-H or --host string UCSC server hostname (Default: genome-mysql.cse.ucsc.edu)
-P or --port int UCSC server port (Default: 3306)
-g or --genome-assembly string Genome assembly for the species of interest (Default: mm9).
For a complete list of UCSC available assemblies, please refer to the UCSC website (https://genome.ucsc.edu/FAQ/FAQreleases.html)
-la or --list-annotations Lists available gene annotation UCSC tables
-a or --annotation string Name of the UCSC table containing the genes annotation (Default: refFlat).
Note: For a complete list of tables available for the chosen assembly, please either use -la (or --list-annotations), or refer to the UCSC website (https://genome.ucsc.edu/cgi-bin/hgTables)
-n or --gene-name If available, gene name/symbol will be used (see UCSC tables' "name2"/"geneName" columns)
-co or --coding-only Builds reference index using only protein-coding transcripts
-no or --noncoding-only Builds reference index using only non-coding transcripts
-u or --unspliced Builds reference index using pre-mRNA sequences (including introns)
-t or --timeout int Connection’s timeout in seconds (Default: 180)
-r or --reference string Path to a FASTA file containing chromosome (or scaffold) sequences for the chosen genome assembly.
Note: if no file is specified, RF Index will try to obtain sequences from the UCSC DAS server. This process may take up to hours, depending on your connection's speed.
-b or --bowtie-build string Path to bowtie-build (or bowtie2-build) executable (Default: assumes bowtie-build/bowtie2-build is in PATH)
-e or --bedtools string Path to bedtools executable (Default: assumes bedtools is in PATH)

Note

For experiments conducted over synthetic RNAs (or custom RNA pools), a reference can be generated by directly invoking the bowtie-build command on a FASTA file lexicographically sorted by sequence ID.

To prepare a custom Bowtie index, simply do:

# Sort reference FASTA file by sequence ID
$ awk 'BEGIN{RS=">"} NR>1 {gsub("\n", "\t"); print ">"$0}' reference_unsorted.fa | \ 
  LC_ALL=C sort -t ' ' -k 2,2 | \ 
  awk '{sub("\t", "\n"); gsub("\t", ""); print $0}' > reference_sorted.fa

# Build a Bowtie index
$ bowtie-build reference_sorted.fa reference_sorted

# Alternatively, build a Bowtie v2 index
$ bowtie2-build reference_sorted.fa reference_sorted

$ ls -l

  -rwxrwxrwx 1 danny danny  96041105 5 mar 10.50 reference_sorted.1.ebwt
  -rwxrwxrwx 1 danny danny  37313744 5 mar 10.50 reference_sorted.2.ebwt
  -rwxrwxrwx 1 danny danny   1844468 5 mar 10.28 reference_sorted.3.ebwt
  -rwxrwxrwx 1 danny danny  74627475 5 mar 10.28 reference_sorted.4.ebwt
  -rwxrwxrwx 1 danny danny 302198817 5 mar 10.28 reference_sorted.fa
  -rwxrwxrwx 1 danny danny  96041105 5 mar 11.11 reference_sorted.rev.1.ebwt
  -rwxrwxrwx 1 danny danny  37313744 5 mar 11.11 reference_sorted.rev.2.ebwt
  -rwxrwxrwx 1 danny danny 302198817 5 mar 10.28 reference_unsorted.fa