BACKGROUND

Our pipeline utilizes the following algorithm and databases:

CLOVER
Clover is an algorithm for identifying functional sites in DNA sequences. Given a set of DNA sequences, this algorithm will compare them to a library of sequence motifs (e.g. transcription factor binding patterns), and identify which if any of the motifs are statistically overrepresented in the sequence set (Frith et al., 2004).

JASPAR
The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. (Portales-Casamar et al., 2010). Our package includes PWMs from 205 different experimentally defined transcription factors.

TRANSFAC
The TRANSFAC database has the most comprehensive data set (redundant, manually curated database, extracted from the original scientific literature) of transcription factor – gene interactions available (Matys V et al., 2006). Our package includes 2,208 PWMs from TRANSFAC.

HOCOMOCO
The Homo Sapiens Comprehensive Model Collection (HOCOMOCO) of transcription factor (TF) binding models was obtained by careful integration of data from different sources. HOCOMOCO contains 426 non-redundant curated binding models for 401 human TFs (Kulakovskiy IV et al., 2013). (Non-redundant hand-curated TFBS models from the following sources: Human ENCODE Yale/HudsonAlpha ChIP-Seq presented in the UCSC Genome Browser, multiplexed parallel SELEX, TRANSFAC 2011.2 SITE table (data for vertebrates) and JASPAR CORE vertebrate).

ENCODE 
The ENCODE Project has generated hundreds of ChIP-Seq experiments for public use, spanning over 220 transcription factor and cell treatment combinations over 91 different cell lines (ENCODE Project Consortium, 2012; Landt et al.) only includes peak calls from ENCODE ChIP-Seq experiments targeting transcription factors.  Our Flat file directory contains downloadable files associated with the ENCODE Uniform TFBS composite track Modified on and can downloaded at  

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/ 

ChEA
The ChEA database maintains an up to date repository of experimental data obtained from experiments such as ChIP-chip, ChIP-seq, ChIP-PET and DamID used to profile the binding of transcription factors to DNA at a genome-wide scale. It maintains a database containing 189933 interactions, manually extracted from 87 publications, describing the binding of 92 transcription factors to 31932 target genes (Alexander Lachmann et al, 2010). The database is freely available at: http://amp.pharm.mssm.edu/lib/chea.jsp.

miRecords
miRecords is a resource for animal miRNA-target interactions. miRecords consists of two components. The Validated Targets component is a large, high-quality database of experimentally validated miRNA targets resulting from meticulous literature curation. We have only used the validated target component of the database (Xiao F, et al. Nucleic Acids Res. 2009)

miRTarBase
miRTarBase has accumulated more than fifty thousand miRNA-target interactions, which are collected by manually surveying pertinent literature after data mining of the text systematically to filter research articles related to functional studies of miRNAs. (Hsu SD et al. Nucleic Acids Res. 2009)