Algorithm Overview
DRIMust is a web accessible software implementation of a new statistical and algorithmic approach [1,2] developed to enable efficient motif searches, covering a broader range of motif spaces, as compared to other state of the art motif searching tools.
In particular, DRIMust can efficiently search for long motifs and for motifs over large alphabets.
DRIMust takes as input ranked lists of sequences and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven.
We base our search on the minimum-hypergeometric (mHG) framework [3,4].
Our algorithm uses suffix trees for an efficient enumeration of k-mers candidates, which are then assessed using the mHG statistics.
In cases where sequence ranking is not relevant or not available DRIMust allows the user to upload positive and negative sets of sequences.
In the latter case DRIMust will search for enriched motifs in the positive set using the negative set as the background.
The DRIMust algorithmic approach is unique in combining an efficient search with a ranked list approach and rigorous P-value estimation.
The web-accessible implementation provides a user-friendly interface and graphical representation of results.
How DRIMust works?
An initial motif search phase produces k-mers, which are words over the alphabet of the input sequences and calculates their statistical significance.
The promising k-mers are next passed as input to a heuristic motif search phase where they are expanded in an iterative manner, until a local optimum motif is found.
The mHG P-value of the motifs together with their occurrences in the input ranked list can be depicted next to the Shannon logo presentation.
References:
-
Leibovich L, Paz I, Yakhini Z, Mandel-Gutfreund Y. (2013) DRIMust: a web server for Discovering Rank Imbalanced Motifs Using Suffix Trees. Nucleic Acids Res., 2013.
[PDF]
-
Leibovich L, Yakhini Z. (2012) Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Res., 2012.
[PDF].
-
Eden E, Lipson D, Yogev S, Yakhini Z. (2007) Discovering Motifs in Ranked Lists of DNA Sequences. PLoS Comput Biol., 2007.
[PDF].
-
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics, 2009.
[PDF].