RF Mutate allows designing mutations aimed at disrupting target structure motifs. Not only this tool can also design compensatory mutations aimed at restoring the wild-type structure, but also it allows designing mutations within ORFs, without altering the underlying amino acid sequence.
RF Mutate requires one or more structure files (either in dot-bracket or CT format) and a motif file, containing the list of the structure motifs to mutagenize. Optionally, an ORF file can be provided, indicating whether (and where) an ORF is present within the analyzed transcripts; in this way, if a target motif overlaps an ORF, RF Mutate can introduce mutations in such a way that the encoded protein remains unchanged. In case no ORF file is provided, RF Mutate can automatically identify the longest ORF (if needed).
Mutagenesis results are reported in XML format, one file per motif.

Usage

To list the required parameters, simply type:

$ rf-mutate -h
Parameter Type Description
-p or --processors int Number of processors to use (Default: 1)
-o or --output-dir string Output directory (Default: rf_mutate/)
-ow or --overwrite string Overwrites output directory if already exists
-mf or --motif-file string Path to a file containing the list of motifs to mutate (mandatory)
-tf or --target-file string Path to a file containing a list of target structures the motifs should fold into upon mutagenesis (optional)
-of or --orf-file string Path to a file containing transcript ORFs (optional)
-lo or --longest-orf Automatically finds the longest ORF
-mo or --min-orf-length int Minimum length (in aa) to select the longest ORF (requires -lo, Default: 50)
-als or --alt-start Longest ORF is allowed to start with alternative start codons (requires -lo)
-ans or --any-start Longest ORF is allowed to start with any codon (requires -lo)
-gc or --genetic-code int Genetic code table for the reference organism (1-33, Default: 1)
Note: for a detailed list of the available genetic code tables, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
-ec or --exclude-codons string A comma (or semicolon) separated list of rare codons to be avoided
-md or --min-distance float Minimum (fractional) base-pair distance between wild-type and mutant (0-1, Default: 0.5)
-t or --tolerance float Maximum (fractional) base-pair distance between wild-type and rescue (0-1, Default: 0.2)
-mi or --max-iterations int Maximum number of iterations (>0, Default: 1000)
-me or --max-evaluate int Maximum number of mutants to evaluate (>0, Default: 1000)
-mr or --max-results int Maximum number of mutants to report per motif (Default: all)
-nm or --n-mutations int Number of bases (or codons) to simultaneously mutate (>0, Default: 1)
-nr or --no-rescue Disables design of rescue mutations
-ne or --no-ensemble-prob Disables evaluation of mutant/rescue Boltzmann ensemble
-vrf or --vienna-rnafold string Path to ViennaRNA RNAfold executable (Default: assumes RNAfold is in PATH)


Genetic code tables

Choice of the genetic code affects the identification of the longest ORF (different organisms use different alternative START and/or STOP codons). The following tables are available:

Table Description
1 Standard Code
2 Vertebrate Mitochondrial Code
3 Yeast Mitochondrial Code
4 Mold, Protozoan, and Coelenterate Mitochondrial Code and Mycoplasma/Spiroplasma Code
5 Invertebrate Mitochondrial Code
6 Ciliate, Dasycladacean and Hexamita Nuclear Code
9 Echinoderm and Flatworm Mitochondrial Code
10 Euplotid Nuclear Code
11 Bacterial, Archaeal and Plant Plastid Code
12 Alternative Yeast Nuclear Code
13 Ascidian Mitochondrial Code
14 Alternative Flatworm Mitochondrial Code
16 Chlorophycean Mitochondrial Code
21 Trematode Mitochondrial Code
22 Scenedesmus obliquus Mitochondrial Code
23 Thraustochytrium Mitochondrial Code
24 Pterobranchia Mitochondrial Code
25 Candidate Division SR1 and Gracilibacteria Code
26 Pachysolen tannophilus Nuclear Code
27 Karyorelict Nuclear Code
28 Condylostoma Nuclear Code
29 Mesodinium Nuclear Code
30 Peritrich Nuclear Code
31 Blastocrithidia Nuclear Code
33 Cephalodiscidae Mitochondrial UAA-Tyr Code


For a detailed description of each genetic code table, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Motif file

The motif file allows providing a list of target structure motifs to mutagenize.
It is composed of one or more lines, each one reporting the transcript ID and a comma (or semicolon) separated list of either motif start coordinates (0-based), or motifs in dot-bracket notation:

Transcript_1;25,67
Transcript_2,((((((....))))));44
Transcript_3;0,99;(((...(((...)))...))),189

Motif start positions must correspond to the first base-paired residue in a helix. In the following example, valid start positions are marked in green:

Helix start

Note

The name of the transcripts in the motif file must match the input file names (e.g. "Transcript#1" expects a file named "Transcript#1.ct", "Transcript#1.db", or "Transcript#1.fasta")

Important

If a dot-bracket structure is provided and it occurs more than once in the target transcript, only the first occurrence will be considered


ORF file

The ORF file allows specifying whether an ORF is present at a given position of the transcript.
It is composed of one or more lines, each one reporting the transcript ID and either the coordinates of the ORF (0-based, inclusive), or the amino acid sequence of the encoded protein (either full or partial):

Transcript_1;48-254
Transcript_2,122
Transcript_3;MYGAAAHKKLDAGASS

Note

Currently, a single ORF per transcript is supported. RF Mutate cannot deal with multiple/overlapping ORFs.

When a single value is provided (e.g. Transcript_2 in the above example), this will be treated as the start coordinate and the end coordinate will be automatically identified.
When providing an amino acid sequence, either the full sequence or just a portion of it can be provided. The sequence will then be automatically extended to the closest in-frame STOP codon (both upstream and downstream). This way, RF Mutate will be able to identify the underlying ORF, hence allowing target motif disruption without altering the encoded protein.
Looking at the following example:

0        9        19       29       39       49       59       69 
|--------|--------|--------|--------|--------|--------|--------|-----
 M  G  I  Y  Q  I  L  A  I  Y  S  T  V  A  S  S  L  V  L  L  V  S  *
ATGGGGATCTATCAGATTCTGGCGATCTACTCAACTGTCGCCAGTTCACTGGTGCTTTTGGTCTCCTAA
..(((((((((((((...((((((((..........))))))))....)))))).....)))))))...

it will be sufficient to indicate in the ORF file the starting "MGIYQ" portion (for example) of the amino acid sequence, to make RF Mutate identify the full underlying ORF.

Target file

The target file allows specifying a target structure each motif should fold into upon mutagenesis.
It is composed of one or more lines, each one reporting the transcript ID, the motif position (as indicated in the motif file) and the target structure in dot-bracket notation. The motif position and the target structure should be separated by a colon, while consecutive motifs on the same transcript must be separated by a semicolon or a comma:

Transcript_1;48:(((((.....)))))...(((((.....)))))
Transcript_2,122:((((((...)))..)))
Transcript_3;95:(((...(((...)))...)));189:(((...)))...((((((.....))))))



Ouput XML files

For each motif being mutagenized, RF Mutate will generate an XML file, with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<motif energy="-24.30" frame="0-65" id="seg4" position="2-65">
        <result n="0">
                <mutant codons="6,7" ddG="8.00" distance="46" energy="-16.30" probability="0.00">
                        <sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCCAGUUCACUGGUGCUUUUGGUCUCC</sequence>
                        <structure>..((((((((...))))))))((((.............((((((....))))))....))))....</structure>
                </mutant>
                <rescue codons="13,14" ddG="6.80" distance="6" energy="-17.50" probability="0.86">
                        <sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCGAGCUCACUGGUGCUUUUGGUCUCC</sequence>
                        <structure>..(((((((((((((...(((((................)))))....)))))).....)))))))</structure>
                </rescue>
        </result>
</motif>

The motif tag’s attributes provide information on the wild-type motif:

Attribute Optional Description
energy no Free energy (in kcal/mol) of the wild-type motif
frame yes In case the motif falls within an ORF, the frame attribute contains the start-end coordinates of the codons within which the motif is enclosed
id no Transcript ID
position no The coordinates of the first and last base within which the motif is enclosed

Six attributes are instead possible within the mutate/rescue tags:

Attribute Optional Description
bases yes For motifs falling within non-coding regions, reports a comma-separated list of the bases (0-based) that have been mutated
codons yes For motifs falling within ORFs, reports a comma-separated list of the codons (0-based) that have been mutated
ddG no Absolute difference (in kcal/mol) between the free energy of the mutant/rescue structure and that of the wild-type structure
distance no Base-pair distance between the mutant/rescue structure and the wild-type structure
energy no Free energy (in kcal/mol) of the mutant/rescue structure
probability no This corresponds to the average probability of the wild-type base-pairs to still be present within the mutant/rescue Boltzman ensemble. Note: if parameter -ne (or --no-ensemble-prob) has been specified, this attribute will be set to NaN