Version 1.0
DRIMust Manual
Input
Input sequences
Input type: The input to DRIMust is a list of sequences in FASTA format (DNA, RNA and protein alphabets are supported). The input list can be of two types:
  • Ranked list: One list of sequences, ordered by a parameter of interest (for example: expression level). DRIMust will search for motifs enriched at the top of the list compared to the rest of the list (the top of the list is dynamically determined by DRIMust).
  • Target and background lists: Two lists of sequences - a target list and a background list (the order within each list is not important). DRIMust will search for motifs enriched in the target list compared to the background list.
In both options, the total number of sequences must not exceed 40,000 and the total number of characters is limited to 4,000,000. The lists may contain 'AGCTUN' characters for DNA and RNA sequences and 'ACDEFGHIKLMNPQRSTVWYX' for protein sequences. Sequences that contain more than 5% 'N' or 'X' will be omitted from the analysis.
Search mode:
  • single-strand search mode: DRIMust treats the input sequences as the plus strand and searches for over-represented motifs at the top of the list. This mode is suitable for DNA, RNA and protein sequences.
  • Double-strand search mode: DRIMust takes into account both the given input sequences (as the plus strand) and their reverse-complement sequences (as the minus strand) and searches for motifs that are enriched in both strands. This mode is suitable for DNA and RNA sequences - dataset may contain 'AGCTUN' characters only.
* Note that the double-strand search will require longer running time than the single-strand mode.
Search parameters
Motif length range: DRIMust can search motifs in a specific length or in a range. The maximal length range allowed is 4-20 characters. The default range is 5-10 characters for single-strand search mode and 10 for double-strand search mode. To select a specific length, insert the same value to both 'Min. length' and 'Max. length' boxes.
Statistical significance threshold: DRIMust will report motifs having P-value better than this threshold. The default threshold is 10-6. Other thresholds between 10-2 and 10-15 can be chosen by the user.
General parameters
Job name: An optional parameter that enables you to give your job an informative name. Otherwise, it will get a unique number identifier.
E-mail address: Enables to get a link to the results by e-mail. It is useful when submitting very long jobs (calculation time depends on the number of sequences, their length and the motif length range. In addition, double-strand search mode requires longer running time than single-strand mode). If you choose not to provide an e-mail address, it is recommended to bookmark the results page.
Results
DRIMust motif searching process is divided into two phases. In the first phase, DRIMust searches for k-mers which are over-represented at the top of the input sequences list. In the second phase, DRIMust expands the most promising k-mers heuristically and creates motifs represented by PSSMs.
The significant motifs can be viewed in the results page in three levels of detail:
* Please note that the results are kept on our server for one month.
Examples and sample data
Ranked list - single-strand search mode
The dataset in this example contains RNA sequences bound to the human pumilio 2 (PUM2) RNA-binding protein obtained by the PAR-CLIP technique. The list comprises 9995 sequences (each of length 100), ranked according to the cluster abundance, as published by Hafner et al., 2010 [1]. DRIMust was run in single-stranded search mode and the rest of the parameters were set to default. DRIMUST found one motif at p-value of 4.9e-394, which is the experimentally verified PUM2 consensus motif [1].
1. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129-141.
Download sample data
View results
Ranked list - double-strand search mode
In this example, 8245 Hoxa2-binding regions from the ChIP-seq experiment by Donaldson et al., 2012 [2] were analyzed. The binding regions were defined by Donaldson et al., 2012 [2] based on the summit region coordinates (200 bp centred upon the MACS defined summit). The DNA sequences were ranked according to their binding P-values (as defined by Donaldson et al.). DRIMust was run using the double-strand search mode and the rest of the parameters were set to default. DRIMUST found one motif at p-value of 1.70e-80, which is the known Hoxa2 consensus motif [2].
2. Donaldson IJ, Amin S, Hensman JJ, Kutejova E, Rattray M, Lawrence N, Hayes A, Ward CM, Bobola N. Genome-wide occupancy links Hoxa2 to Wnt-β-catenin signaling in mouse embryonic development. Nucleic Acids Res. 2012; 40:3990-4001.
Download sample data
View results
Target and background lists - double-strand search mode
The following example comprises TP53 high-confidence binding sites reported by Smeenk et al., 2008 [3], containing 1546 loci in the human genome (target set). The sequences at the target set contain 200 bp upstream and downstream to the proposed binding site. The background set contains 1546 random sequences taken arbitrarily from the human genome. DRIMust was run using the double-strand search mode, with target and background lists, and the rest of the parameters were set to default. DRIMust found one motif at P-value of 2.22e-266, which is the TP53 consensus motif [3].
3. Riley T, Sontag E, Chen P, Levine A. Transcriptional control of human p53-regulated genes. Nat. Rev. Mol. Cell Biol. 2008;9:402-412.
Download target list     Download background list
View results