RF Mutate allows designing mutations aimed at disrupting target structure motifs. Not only this tool can also design compensatory mutations aimed at restoring the wild-type structure, but also it allows designing mutations within ORFs, without altering the underlying amino acid sequence.
RF Mutate requires one or more structure files (either in dot-bracket or CT format) and a motif file, containing the list of the structure motifs to mutagenize. Optionally, an ORF file can be provided, indicating whether (and where) an ORF is present within the analyzed transcripts; in this way, if a target motif overlaps an ORF, RF Mutate can introduce mutations in such a way that the encoded protein remains unchanged. In case no ORF file is provided, RF Mutate can automatically identify the longest ORF (if needed).
Mutagenesis results are reported in XML format, one file per motif.

Usage

To list the required parameters, simply type:

$ rf-mutate -h

Parameter	Type	Description
-p or --processors	int	Number of processors to use (Default: 1)
-o or --output-dir	string	Output directory (Default: rf_mutate/)
-ow or --overwrite	string	Overwrites output directory if already exists
-mf or --motif-file	string	Path to a file containing the list of motifs to mutate (mandatory)
-tf or --target-file	string	Path to a file containing a list of target structures the motifs should fold into upon mutagenesis (optional)
-of or --orf-file	string	Path to a file containing transcript ORFs (optional)
-lo or --longest-orf		Automatically finds the longest ORF
-mo or --min-orf-length	int	Minimum length (in aa) to select the longest ORF (requires `-lo`, Default: 50)
-als or --alt-start		Longest ORF is allowed to start with alternative start codons (requires `-lo`)
-ans or --any-start		Longest ORF is allowed to start with any codon (requires `-lo`)
-gc or --genetic-code	int	Genetic code table for the reference organism (1-33, Default: 1) Note: for a detailed list of the available genetic code tables, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
-ec or --exclude-codons	string	A comma (or semicolon) separated list of rare codons to be avoided
-md or --min-distance	float	Minimum (fractional) base-pair distance between wild-type and mutant (0-1, Default: 0.5)
-t or --tolerance	float	Maximum (fractional) base-pair distance between wild-type and rescue (0-1, Default: 0.2)
-mi or --max-iterations	int	Maximum number of iterations (>0, Default: 1000)
-me or --max-evaluate	int	Maximum number of mutants to evaluate (>0, Default: 1000)
-mr or --max-results	int	Maximum number of mutants to report per motif (Default: all)
-nm or --n-mutations	int	Number of bases (or codons) to simultaneously mutate (>0, Default: 1)
-nr or --no-rescue		Disables design of rescue mutations
-ne or --no-ensemble-prob		Disables evaluation of mutant/rescue Boltzmann ensemble
-vrf or --vienna-rnafold	string	Path to ViennaRNA RNAfold executable (Default: assumes RNAfold is in PATH)

Genetic code tables

Choice of the genetic code affects the identification of the longest ORF (different organisms use different alternative START and/or STOP codons). The following tables are available:

Table	Description
1	Standard Code
2	Vertebrate Mitochondrial Code
3	Yeast Mitochondrial Code
4	Mold, Protozoan, and Coelenterate Mitochondrial Code and Mycoplasma/Spiroplasma Code
5	Invertebrate Mitochondrial Code
6	Ciliate, Dasycladacean and Hexamita Nuclear Code
9	Echinoderm and Flatworm Mitochondrial Code
10	Euplotid Nuclear Code
11	Bacterial, Archaeal and Plant Plastid Code
12	Alternative Yeast Nuclear Code
13	Ascidian Mitochondrial Code
14	Alternative Flatworm Mitochondrial Code
16	Chlorophycean Mitochondrial Code
21	Trematode Mitochondrial Code
22	Scenedesmus obliquus Mitochondrial Code
23	Thraustochytrium Mitochondrial Code
24	Pterobranchia Mitochondrial Code
25	Candidate Division SR1 and Gracilibacteria Code
26	Pachysolen tannophilus Nuclear Code
27	Karyorelict Nuclear Code
28	Condylostoma Nuclear Code
29	Mesodinium Nuclear Code
30	Peritrich Nuclear Code
31	Blastocrithidia Nuclear Code
33	Cephalodiscidae Mitochondrial UAA-Tyr Code

For a detailed description of each genetic code table, please refer to https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Motif file

The motif file allows providing a list of target structure motifs to mutagenize.
It is composed of one or more lines, each one reporting the transcript ID and a comma (or semicolon) separated list of either motif start coordinates (0-based), or motifs in dot-bracket notation:

Transcript_1;25,67
Transcript_2,((((((....))))));44
Transcript_3;0,99;(((...(((...)))...))),189

Motif start positions must correspond to the first base-paired residue in a helix. In the following example, valid start positions are marked in green:

Helix start

Note

The name of the transcripts in the motif file must match the input file names (e.g. "Transcript#1" expects a file named "Transcript#1.ct", "Transcript#1.db", or "Transcript#1.fasta")

Important

If a dot-bracket structure is provided and it occurs more than once in the target transcript, only the first occurrence will be considered

ORF file

The ORF file allows specifying whether an ORF is present at a given position of the transcript.
It is composed of one or more lines, each one reporting the transcript ID and either the coordinates of the ORF (0-based, inclusive), or the amino acid sequence of the encoded protein (either full or partial):

Transcript_1;48-254
Transcript_2,122
Transcript_3;MYGAAAHKKLDAGASS

Note

Currently, a single ORF per transcript is supported. RF Mutate cannot deal with multiple/overlapping ORFs.

When a single value is provided (e.g. Transcript_2 in the above example), this will be treated as the start coordinate and the end coordinate will be automatically identified.
When providing an amino acid sequence, either the full sequence or just a portion of it can be provided. The sequence will then be automatically extended to the closest in-frame STOP codon (both upstream and downstream). This way, RF Mutate will be able to identify the underlying ORF, hence allowing target motif disruption without altering the encoded protein.
Looking at the following example:

0        9        19       29       39       49       59       69 
|--------|--------|--------|--------|--------|--------|--------|-----
 M  G  I  Y  Q  I  L  A  I  Y  S  T  V  A  S  S  L  V  L  L  V  S  *
ATGGGGATCTATCAGATTCTGGCGATCTACTCAACTGTCGCCAGTTCACTGGTGCTTTTGGTCTCCTAA
..(((((((((((((...((((((((..........))))))))....)))))).....)))))))...

it will be sufficient to indicate in the ORF file the starting "MGIYQ" portion (for example) of the amino acid sequence, to make RF Mutate identify the full underlying ORF.

Target file

The target file allows specifying a target structure each motif should fold into upon mutagenesis.
It is composed of one or more lines, each one reporting the transcript ID, the motif position (as indicated in the motif file) and the target structure in dot-bracket notation. The motif position and the target structure should be separated by a colon, while consecutive motifs on the same transcript must be separated by a semicolon or a comma:

Transcript_1;48:(((((.....)))))...(((((.....)))))
Transcript_2,122:((((((...)))..)))
Transcript_3;95:(((...(((...)))...)));189:(((...)))...((((((.....))))))

Ouput XML files

For each motif being mutagenized, RF Mutate will generate an XML file, with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<motif energy="-24.30" frame="0-65" id="seg4" position="2-65">
        <result n="0">
                <mutant codons="6,7" ddG="8.00" distance="46" energy="-16.30" probability="0.00">
                        <sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCCAGUUCACUGGUGCUUUUGGUCUCC</sequence>
                        <structure>..((((((((...))))))))((((.............((((((....))))))....))))....</structure>
                </mutant>
                <rescue codons="13,14" ddG="6.80" distance="6" energy="-17.50" probability="0.86">
                        <sequence>CUGGGGAUCUAUCAGAUUCUCGCCAUCUACUCAACUGUCGCGAGCUCACUGGUGCUUUUGGUCUCC</sequence>
                        <structure>..(((((((((((((...(((((................)))))....)))))).....)))))))</structure>
                </rescue>
        </result>
</motif>

The motif tag’s attributes provide information on the wild-type motif:

Attribute	Optional	Description
energy	no	Free energy (in kcal/mol) of the wild-type motif
frame	yes	In case the motif falls within an ORF, the frame attribute contains the start-end coordinates of the codons within which the motif is enclosed
id	no	Transcript ID
position	no	The coordinates of the first and last base within which the motif is enclosed

Six attributes are instead possible within the mutate/rescue tags:

Attribute	Optional	Description
bases	yes	For motifs falling within non-coding regions, reports a comma-separated list of the bases (0-based) that have been mutated
codons	yes	For motifs falling within ORFs, reports a comma-separated list of the codons (0-based) that have been mutated
ddG	no	Absolute difference (in kcal/mol) between the free energy of the mutant/rescue structure and that of the wild-type structure
distance	no	Base-pair distance between the mutant/rescue structure and the wild-type structure
energy	no	Free energy (in kcal/mol) of the mutant/rescue structure
probability	no	This corresponds to the average probability of the wild-type base-pairs to still be present within the mutant/rescue Boltzman ensemble. Note: if parameter `-ne` (or `--no-ensemble-prob`) has been specified, this attribute will be set to NaN