Motivation of CoopTFD

Transcriptional regulation of gene expression is one of the major mechanisms for cells to respond to environmental and physiological changes. This kind of regulation is usually accomplished by cooperative transcription factors (TFs). The cooperativity among TFs enables cells to use a relatively small number of TFs in establishing the complex spatial and temporal patterns of gene expression. Therefore, identifying cooperative TFs is helpful for uncovering the mechanisms of transcriptional regulation.

With the advent of many high-throughput experimental technologies (e.g. DNA sequencing, microarray, ChIP-chip, TF knockout experiments, protein array, and nucleosome positioning sequencing), it is now possible to study the cooperative interactions among TFs. Many computational algorithms have been developed to predict cooperative TF pairs by using one data source or integrating multiple data sources generated by high-throughput experimental technologies.

Most existing cooperative TFs identification algorithms were applied to the model organism Saccharomyces cerevisiae. Different algorithms predicted different number of cooperative TF pairs ranging from a dozen to more than three thousands. These predicted cooperative TF pairs (PCTFPs) are valuable resources and provide testable hypotheses for future experimental investigation. Unfortunately, these PCTFPs were scattered in different papers. Until now, there is still no database that collects the yeast cooperative TFs from the literature. This prompts us to construct the first database, named Cooperative Transcription Factors Database (CoopTFD).





What is CoopTFD?

CoopTFD has a comprehensive collection of 2622 PCTFPs in yeast from 17 existing algorithms. To help users judge the biological plausibility of a specific PCTFP of interest, our database provides five types of validation information: (i) the algorithms which predict this PCTFP, (ii) the publications which experimentally show that this PCTFP has physical or genetic interactions, (iii) the publications which experimentally study the biological roles of both TFs of this PCTFP, (iv) the common Gene Ontology (GO) terms of this PCTFP, and (v) the common target genes of this PCTFP. We believe that CoopTFD will be a valuable resource for yeast biologists to study gene regulation.





Collection of PCTFPs from existing algorithms

In yeast, many cooperative TF pairs have been predicted by various algorithms in the literature. We collected 2622 PCTFPs among 143 TFs from 17 computational studies, which developed distinct algorithms to predict PCTFPs by integrating multiple data sources. The following table gives the details of these 16 computational studies.

Publication Data sources integrated Algorithm description The number of identified predicted cooperative TF pairs (PCTFPs)
Banerjee and Zhang (2003) ChIP-chip data and gene expression data A TF pair is called a PCTFP if the genes bound by both TFs are more co-expressed than are the genes bound by either TF alone. 31
Harbison et al. (2004) ChIP-chip data and promoter sequence data A TF pair is called a PCTFP if their binding sites co-occur more frequently within the same promoters than would be expected by chance. 94
Nagamine et al. (2005) ChIP-chip data and PPI data A TF pair is called a PCTFP if the genes bound by both TFs are closer in the PPI network than are the genes bound by either TF alone. 24
Tsai et al. (2005) ChIP-chip data and gene expression data A TF pair is called a PCTFP if their interaction effect (estimated using ANOVA) significantly influences the expression of genes bound by both TFs. 18
Balaji et al. (2006) ChIP-chip data A TF pair is called a PCTFP if the observed number of shared target genes is higher than random expectation. 3459
Chang et al. (2006) ChIP-chip data and gene expression data A stochastic system model is developed to assess TF cooperativity. 55
He et al. (2006) ChIP-chip data and gene expression data The multivariate statistical method, ANOVA, is used to test whether the expression of the target genes were significantly influenced by the cooperative effect of their TFs. 30
Wang (2006) ChIP-chip data, gene expression data, and promoter sequence data Pairwise mixed graphical models or Gaussian graphical models are used for identifying combinatorial regulation of transcription factors. 14
Yu et al. (2006) ChIP-chip data and promoter sequence data An algorithm called Motif-PIE is developed for predicting interacting TF pairs based on the co-occurrence of their binding motifs and the distance between the motifs in promoter sequences. 300
Elati et al. (2007) Gene expression data A data mining technique called LICORN is developed for deriving cooperative regulations. 20
Datta and Zhao (2008) ChIP-chip data Log-linear models are used to study cooperative bindings among TFs. 25
Chuang et al. (2009) ChIP-chip data, gene expression data and promoter sequence data A TF pair is called a PCTFP if the distance between their TFBSs (in the promoter of their common target genes) is significantly closer than expected by chance. 13
Wang et al. (2009) ChIP-chip data, gene expression data, promoter sequence data, PPI data, TF-gene documented regulation data, and comparative genomic data A Bayesian network framework is presented to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. 159
Yang et al. (2010) ChIP-chip data and TF knockout data Cooperative TF pairs are predicted by identifying the most statistically significant overlap of target genes regulated by two TFs in ChIP-chip data and TF knockout data. 186
Chen et al. (2012) ChIP-chip data and promoter sequence data A method called simTFBS is developed for inferring TF-TF interactions by incorporating motif discovery as a fundamental step when detecting overlapping targets of TFs based on ChIP-chip data. 221
Lai et al. (2014) TF-gene documented regulation data, TFBS data, and nucleosome occupancy data A TF pair is called a PCTFP if (i) these two TFs have a significantly higher number of common target genes than random expectation and (ii) their binding sites (in the promoters of their common target genes) tend to be co-depleted of nucleosomes in order to make these binding sites simultaneously accessible to TF binding. 27
Wu and Lai (2015) TF binding and TF perturbation data A TF pair is called a PCTFP if the overlap of the targets (defined by TF binding and TF perturbation data) of these two TFs is higher than random expectation. 50




Construction of validation information for each PCTFP

To help users judge the biological plausibility of a PCTFP, we provide five types of validation information using various data sources.

1

The number of algorithms which predicted this PCTFP is given. The higher the number is, the higher the statistical confidence of this PCTFP is.

2

The number of publications which experimentally show that this PCTFP has physical or genetic interactions is given. The publications were retrieved from BioGRID database. Having physical or genetic interactions strengthens the confidence of the biological plausibility of this PCTFP.

3

The number of publications which experimentally study the biological roles of both TFs of this PCTFP is given. The publications were retrieved from SGD database. If a PCTFP is of biological significance, both TFs should be studied in the same publication. Therefore, the higher the number is, the more biological plausibility of this PCTFP is.

4

The common Gene Ontology (GO) terms of this PCTFP are given. The GO terms of a TF were retrieved from SGD database. Having common GO terms provide users with strengthened evidence of the biological plausibility of this PCTFP.

5

The common target genes of this PCTFP are provided. The target genes of a TF were retrieved from the YEASTRACT database. The regulatory associations between a TF and its target genes are validated by TF binding evidence, which means the experimental evidence (from band-shift, foot-printing or ChIP assay) showing that the TF binds to the promoters of the target genes. Since the biological role of a cooperative TF pair is to co-regulate the expression of a set of genes, knowing the common target genes of the two TFs of a PCTFP helps users to evaluate the biological plausibility of a PCTFP.





Database interface

CoopTFD provides two search modes and a browse mode.


First search mode:

Users can input a list of TFs of interest and specify the lowest number of algorithms that should predict a PCTFP.


Then YCTFsD returns a figure showing a cooperative TF network containing all PCTFPs among the input TFs.


Moreover, a table is given listing five types of validation information of each PCTFP in the cooperative TF network.


The five types of validation information are as follows. The first three types are the number of algorithms which predict this PCTFP, the number of publications which experimentally show that this PCTFP has physical or genetic interactions, and the number of publications which experimentally study the biological roles of both TFs of this PCTFP. When clicking on the number, it opens a webpage showing the details (e.g. the authors, titles, journals and dates) of the publications.


The abstract of each publication in Pubmed can also be seen by clicking on the title of the publication.


The fourth kind of validation information is the number of common GO terms of this PCTFP. When clicking on the number, it opens a webpage showing the names of the common GO terms.


By clicking on the names, users will be redirected to SGD database to see the details of these GO terms.


The last kind is the number of common target genes of this PCTFP. When clicking on the number, it opens a webpage showing the names of the common target genes and the numbers of the TF binding evidence that validate the TF-target gene relationship.


The publications which provide the TF binding evidence can also be shown by clicking on the number.



Second search mode:

Users can input a TF of interest and specify the lowest number of algorithms that should predict a PCTFP.


Then CoopTFD returns a table listing all possible PCTFPs that are related to the input TF and satisfied the specification.


Browse mode:

Users can browse CoopTFD by a TF name. In total, 2622 PCTFPs among 143 TFs are deposited in CoopTFD.


When users click on a TF name, our database returns a table listing five types of validation information of each PCTFP that is related to the TF of interest.

This is actually the same result when users use the second search mode and specify one as the lowest number of algorithms that should predict a PCTFP.