YMF is a program that detects statistically
overrepresented words (motifs) in DNA sequences. The user may specify the characteristics of the motifs to be
detected. A motif here is a short string of nucleotides, degenerate symbols, and spacers.
'Motif size' is the number of non-spacer characters in a motif. Spacers ('N's) are constrained to be
in the center of the motif. Degenerate symbols allowed in a motif are R (purine - A or G), Y (pyrimidine - C or T),
W (A or T), and S (C or G).
Given a set of sequences, YMF does an enumerative search among all motifs that match the specified characteristics,
scoring each motif for its significance, and outputs the top several motifs, sorted by their significance. The
significance of a motif is measured by the Z-score of its count in the input sequences. If a motif occurs N times
in the input sequences, but was expected to occur E times (with standard deviation of S) in random sequences of
the same length, generated by a suitable background model), then the Z-score of the motif is (N-E)/S.
YMF constructs a third-order Markov model of the background sequences (e.g., all known promoter sequences)
for the organism under study. Such models are already constructed for some model organisms, and can be used
by selecting the appropriate organism in the field called 'Organism'. Background models may also be constructed
by the user, by following the "Work with your own organism" link at the bottom of the YMF page.
-
Sinha, S. and Tompa, M.
Discovery of Novel Transcription Factor Binding Sites by Statistical
Overrepresentation.
Nucleic Acids Research,
vol. 30, no. 24, December 2002, 5549-5560.
-
Sinha, S. and Tompa, M.
A Statistical Method for Finding Transcription Factor Binding Sites,
Eighth International Conference on Intelligent Systems for
Molecular Biology, San Diego, CA, August 2000, 344-354.
FindExplanators is a program that extracts from the set of significant motifs reported by YMF, a smaller set of "real"
motifs. More specifically, given a set of DNA sequences P, and a set of motifs M (such as those reported by YMF), it extracts
a subset E of motifs in M, such that given the occurrences of the motifs of E in the sequences P, the remaining motifs in M
are not statistically significant.
User-created organisms are stored on the server, and cookies are used to
ensure that only the creator of that organism can access the information.
However, this information may be removed by the webmaster without notice to the user,
in order to meet space constraints on the server.
If you are missing an organism you created earlier, it has either been removed
or is now showing up in the standard 'Organism' menu.
If you wish to have your organism show up on the standard 'Organism' menu, send
mail to sinhas@uiuc.edu
Close this window