ProbTF is a software tool for predicting transcription factor binding using experimentally verified Position Weight Matrices (PWMs). The included sets of PWMs are from the TRANSFAC Public 7.0 databaseProbTF provides a probabilistic framework for transcription factor binding prediction which has three important features. First, ProbTF is probabilistic in nature and thus outputs a probability of binding (as opposed to a p-value). Second, the method answers the question of whether the whole promoter has a binding site. Third, ProbTF provides a principled way of combining multiple data sources, such as evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip, and other prior knowledge, into a unified probabilistic framework. This method was used in:
Full computational details can be found from the article.[Back to top of page]
The TRANSFAC matrices used in the ProbTF are from the Public 7.0 release. Matrices were divided into sets based on the species they are derived from: 187 mouse matrices corresponding to 121 mouse TFs. Due to the TRANSFAC licence agreement, only those matrices within the Public release can be used in analyses via a web service.Full list of included TRANSFAC Public 7.0 matrices: TRANSFACmatrices.txt. [Back to top of page]
Some of the PSWMs can be remarkably diffuse because they are computed from a few experimentally verified sequences. Consequently, many of the PSWMs are likely to contain zero probabilities (zero pseudocounts). To prevent zero probabilities, we add one pseudo count to all entries in the PSWMs. To keep keep this process comperable between different matrices, we first scale the counts so that they sum to 100, add an additional pseudocount, and then re-scale the column of PSWMs to get PSFMs. For example,
Before: A C G T 0 20 30 0 Step1: A C G T 0 40 60 0 Step2: A C G T 1 41 61 1 Final: A C G T 0.0096 0.3942 0.5865 0.0096
FASTA files: FASTA is probably the simplest of formats for unaligned sequences. FASTA files are easily created in a text editor. Each sequence is preceded by a line starting with >. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence (free format). The remaining lines contain the sequence itself. You can put as many letters on a sequence line as you want. An example is shown below:
>sequenceOne The first example sequence. GATGGATGGGCTAGATGATCGGATAGAGAGAGAGAGATTGTAG GATGGTATTTTAGATAGATAGAGAGAG[Back to top of page]
A set of DNA sequences was used to compute parameters of the Markovian background models. Models of order 0 to 3 are currently available.[Back to top of page]
DownloadsBack to top of page]
If you have an problems using the web server please read the FAQ first to see whether there is an answer to your problem there.
Otherwise if you have any comments or questions regarding ProbTF you may email:
[Back to top of page]
The development of the ProbTF is supported by grants from the National Institute of General Medical Sciences (R01-GM072855) and the National Institute of Allergy and Infectious Diseases (U54 AI54253).
The development of the ProbTF is supported by grants from the
National Institute of General Medical Sciences (R01-GM072855)
and the National Institute of Allergy and Infectious Diseases