RF Mutate allows designing mutations aimed at disrupting target structure motifs. Not only this tool can also design compensatory mutations aimed at restoring the wild-type structure, but also it allows designing mutations within ORFs, without altering the underlying amino acid sequence.
RF Mutate requires one or more structure files (either in dot-bracket or CT format) and a motif file, containing the list of the structure motifs to mutagenize. Optionally, an ORF file can be provided, indicating whether (and where) an ORF is present within the analyzed transcripts; in this way, if a target motif overlaps an ORF, RF Mutate can introduce mutations in such a way that the encoded protein remains unchanged. In case no ORF file is provided, RF Mutate can automatically identify the longest ORF (if needed).
Mutagenesis results are reported in XML format, one file per motif.
Usage
To list the required parameters, simply type:
$ rf-mutate -h
Parameter | Type | Description |
---|---|---|
-p or --processors | int | Number of processors to use (Default: 1) |
-o or --output-dir | string | Output directory (Default: rf_mutate/) |
-ow or --overwrite | string | Overwrites output directory if already exists |
-mf or --motif-file | string | Path to a file containing the list of motifs to mutate (mandatory) |
-of or --orf-file | string | Path to a file containing transcript ORFs (optional) |
-lo or --longest-orf | Automatically finds the longest ORF | |
-mo or --min-orf-length | int | Minimum length (in aa) to select the longest ORF (requires -lo , Default: 50) |
-als or --alt-start | Longest ORF is allowed to start with alternative start codons (requires -lo ) |
|
-ans or --any-start | Longest ORF is allowed to start with any codon (requires -lo ) |
|
-gc or --genetic-code | int | Genetic code table for the reference organism (1-33, Default: 1) Note: for a detailed list of the available genetic code tables, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi |
-ec or --exclude-codons | string | A comma (or semicolon) separated list of rare codons to be avoided |
-md or --min-distance | float | Minimum (fractional) base-pair distance between wild-type and mutant (0-1, Default: 0.5) |
-t or --tollerance | float | Maximum (fractional) base-pair distance between wild-type and rescue (0-1, Default: 0.2) |
-mi or --max-iterations | int | Maximum number of iterations (>0, Default: 1000) |
-me or --max-evaluate | int | Maximum number of mutants to evaluate (>0, Default: 1000) |
-mr or --max-results | int | Maximum number of mutants to report per motif (Default: all) |
-nm or --n-mutations | int | Number of bases (or codons) to simultaneously mutate (>0, Default: 1) |
-nr or --no-rescue | Disables design of rescue mutations | |
-ne or --no-ensemble-prob | Disables evaluation of mutant/rescue Boltzmann ensemble | |
-vrf or --vienna-rnafold | string | Path to ViennaRNA RNAfold executable (Default: assumes RNAfold is in PATH) |
Genetic code tables
Choice of the genetic code affects the identification of the longest ORF (different organisms use different alternative START and/or STOP codons). The following tables are available:
Table | Description |
---|---|
1 | Standard Code |
2 | Vertebrate Mitochondrial Code |
3 | Yeast Mitochondrial Code |
4 | Mold, Protozoan, and Coelenterate Mitochondrial Code and Mycoplasma/Spiroplasma Code |
5 | Invertebrate Mitochondrial Code |
6 | Ciliate, Dasycladacean and Hexamita Nuclear Code |
9 | Echinoderm and Flatworm Mitochondrial Code |
10 | Euplotid Nuclear Code |
11 | Bacterial, Archaeal and Plant Plastid Code |
12 | Alternative Yeast Nuclear Code |
13 | Ascidian Mitochondrial Code |
14 | Alternative Flatworm Mitochondrial Code |
16 | Chlorophycean Mitochondrial Code |
21 | Trematode Mitochondrial Code |
22 | Scenedesmus obliquus Mitochondrial Code |
23 | Thraustochytrium Mitochondrial Code |
24 | Pterobranchia Mitochondrial Code |
25 | Candidate Division SR1 and Gracilibacteria Code |
26 | Pachysolen tannophilus Nuclear Code |
27 | Karyorelict Nuclear Code |
28 | Condylostoma Nuclear Code |
29 | Mesodinium Nuclear Code |
30 | Peritrich Nuclear Code |
31 | Blastocrithidia Nuclear Code |
33 | Cephalodiscidae Mitochondrial UAA-Tyr Code |
For a detailed description of each genetic code table, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
Motif file
The motif file allows providing a list of target structure motifs to mutagenize.
It is composed of one or more lines, each one reporting the transcript ID and a comma (or semicolon) separated list of either motif start coordinates (0-based), or motifs in dot-bracket notation:
Transcript_1;25,67
Transcript_2,((((((....))))));44
Transcript_3;0,99;(((...(((...)))...))),189
Motif start positions must correspond to the first base-paired residue in a helix. In the following example, valid start positions are marked in green:
Note
The name of the transcripts in the motif file must match the input file names (e.g. "Transcript#1" expects a file named "Transcript#1.ct", "Transcript#1.db", or "Transcript#1.fasta")
Important
If a dot-bracket structure is provided and it occurs more than once in the target transcript, only the first occurrence will be considered
ORF file
The ORF file allows specifying whether an ORF is present at a given position of the transcript.
It is composed of one or more lines, each one reporting the transcript ID and either the coordinates of the ORF (0-based, inclusive), or the amino acid sequence of the encoded protein (either full or partial):
Transcript_1;48-254
Transcript_2,122
Transcript_3;MYGAAAHKKLDAGASS
Note
Currently, a single ORF per transcript is supported. RF Mutate cannot deal with multiple/overlapping ORFs.
When a single value is provided (e.g. Transcript_2 in the above example), this will be treated as the start coordinate and the end coordinate will be automatically identified.
When providing an amino acid sequence, either the full sequence or just a portion of it can be provided. The sequence will then be automatically extended to the closest in-frame STOP codon (both upstream and downstream). This way, RF Mutate will be able to identify the underlying ORF, hence allowing target motif disruption without altering the encoded protein.
Looking at the following example:
0 9 19 29 39 49 59 69
|--------|--------|--------|--------|--------|--------|--------|-----
M G I Y Q I L A I Y S T V A S S L V L L V S *
ATGGGGATCTATCAGATTCTGGCGATCTACTCAACTGTCGCCAGTTCACTGGTGCTTTTGGTCTCCTAA
..(((((((((((((...((((((((..........))))))))....)))))).....)))))))...
it will be sufficient to indicate in the ORF file the starting "MGIYQ" portion (for example) of the amino acid sequence, to make RF Mutate identify the full underlying ORF.
Ouput XML files
For each motif being mutagenized, RF Mutate will generate an XML file, with the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<motif energy="-24.30" frame="0-65" id="seg4" position="2-65">
<result n="0">
<mutant codons="6,7" ddG="8.00" distance="46" energy="-16.30" probability="0.00">
<sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCCAGUUCACUGGUGCUUUUGGUCUCC</sequence>
<structure>..((((((((...))))))))((((.............((((((....))))))....))))....</structure>
</mutant>
<rescue codons="13,14" ddG="6.80" distance="6" energy="-17.50" probability="0.86">
<sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCGAGCUCACUGGUGCUUUUGGUCUCC</sequence>
<structure>..(((((((((((((...(((((................)))))....)))))).....)))))))</structure>
</rescue>
</result>
</motif>
The motif tag’s attributes provide information on the wild-type motif:
Attribute | Optional | Description |
---|---|---|
energy | no | Free energy (in kcal/mol) of the wild-type motif |
frame | yes | In case the motif falls within an ORF, the frame attribute contains the start-end coordinates of the codons within which the motif is enclosed |
id | no | Transcript ID |
position | no | The coordinates of the first and last base within which the motif is enclosed |
Six attributes are instead possible within the mutate/rescue tags:
Attribute | Optional | Description |
---|---|---|
bases | yes | For motifs falling within non-coding regions, reports a comma-separated list of the bases (0-based) that have been mutated |
codons | yes | For motifs falling within ORFs, reports a comma-separated list of the codons (0-based) that have been mutated |
ddG | no | Absolute difference (in kcal/mol) between the free energy of the mutant/rescue structure and that of the wild-type structure |
distance | no | Base-pair distance between the mutant/rescue structure and the wild-type structure |
energy | no | Free energy (in kcal/mol) of the mutant/rescue structure |
probability | no | This corresponds to the average probability of the wild-type base-pairs to still be present within the mutant/rescue Boltzman ensemble. Note: if parameter -ne (or --no-ensemble-prob ) has been specified, this attribute will be set to NaN |