momo <algorithm> [options] <PSM file>
The algorithm used to search for motifs. Available algorithms include "simple", "motifx", and "modl".
A tab-delimited peptide-spectrum mmatch (PSM) file in which each row corresponds to a mass spectrum annotated with its corresponding peptide sequence. Currently, only PSM files generated by using the Tide search engine are supported.
MoMo will create a directory, named momo_out
by default.
Any existing output files in the directory will be overwritten. The
directory will contain:
momo.txt
) containing the PTM motifs in text format.momo.html
) containing the same motifs in HTML format.The default output directory can be overridden using the --o or --oc options which are described below.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
General Options | |||
--bg-filetype | fasta|prealigned | This option inidicates the format of the the protein database. | The protein database is specified as a fasta file. |
--count-threshold | num | This option inidicates minimum number of sequences in the phosphorylation data set needed to match the residue/position pair for each recursive iteration of motif-x. | The default count threshold is 20 occurrences. |
--eliminate-repeats | num | This option will remove duplicate copies of modifications with identical flanking sequences. The integer parameter specifies the width of the region used to determine identify. Because the window is symmetric around the central, modified amino acid, the width parameter must be odd. To turn this option off, specify a width of 0. | All modifications that are identical up to 7 amino acids long (3 on each side of the modification) are removed. |
--fg-filetype | fasta|prealigned|psm | This option inidicates the format of the input file with the phosphorylated sequences. | The input is specified as a psm file. |
--filter | [field],lt|le|eq|ge|gt,[threshold] | Only PSMs with scores better than the specified threshold are accepted for analysis. The "[field]" component of the parameter specifies the name of the column from which the score is drawn. The next component specifies whether PSMs with scores less than, less than or equal, etc. are retained. The third component is the threshold itself. | No filter. |
--hash-fasta | num | If a protein database is provided, the process of finding the location of the peptide within the protein can be sped up using an O(1) lookup table hashing from each unique kmer to an arraylist of locations. The number specified is used as an argument to the kmer length. If the number specified is 0, then the program will proceed using linear search instead of creating a lookup table. | Create an O(1) lookup table with kmer length 6. |
--max-iterations | num | The maximum number of iterations for MoDL before it stops. | MoDL will stop after 50 iterations. |
--max-motifs | num | The maximum number of motifs MoDL is allowed to find. | MoDL cannot allow more than 100 motifs. |
--no-stop-decrease-iteration | num | MoDL will stop if there is no decrease in MDL after several iterations. | MoDL stops after 10 iterations of no decrease or equal. |
--min-occurences | num | A motif will only be constructed if the specified number of occurrences is reached. This threshold is applied after eliminating repeats. | Only print motif if pattern occurs at least 5 times. |
--protein-database | protein database file | The protein database used to generate the PSM file. If provided, this file is used to find the amino acids flanking each modification and also to generate the background frequencies. | Flanking sequences are derived from the given PSM file, substituting "don't care" symbols for missing entries. Amino acid background frequencies are derived from the non-redundant protein database. |
--remove-unknowns | T|F | This option inidicates whether to remove all sequences that contain an 'X'. | Do not remove unknown sequences. |
--score-threshold | num | This option inidicates the largest bionmial probability for a residue/position pair to be counted as significant during each recursive iteration of motif-x. | The binomial probability must be smaller than 0.000001 to be considered significant. |
--single-motif-per-mass | T|F | Provides the option of generating a single motif for each distinct modification mass. For example, phosphorylation is typically specified as a mass of 79.97 added to the amino acids S, T or Y. If this parameter is set to false, then three separate phosphorylation motifs are generated, each with a perfectly conserved central amino acid. If true, then all the phosphorylation events are combined into a single motif, with a mixture of S, T and Y in the central position. | Default is set to false. |
--version | Display the version and exit. | Run as normal. | |
--width | num | This is an integer specifying the width of the motif. Because the motif is symmetric around the central, modified amino acid, the width parameter should be odd. | Motifs of width 7 are generated. |
If you use MoMo in your research, please cite the following paper: