Thu. Nov 21st, 2024

N Additional file 1). For example, liver andTaher et al. Genome Biology 2013, 14:R117 http://genomebiology.com/2013/14/10/RPage 9 ofheart enhancer predictions in the loci of highly expressed genes are significantly more conserved than the SP600125 site sequences used as basis for making predictions (0.41 versus 0.34, and 0.43 versus 0.37, with P-values 1.1 ?10-32 and 5.6 ?10-33, respectively, calculated using the Wilcoxon rank-sum test). For models that did not perform well in terms of their fold enrichment between the proportion of enhancer predictions in the loci of highly and lowly expressed genes (for example, skin and fetal brain), we observed significantly less constrained predictions. We observed similar trends when we applied our promoter-based classifiers to investigate unconstrained sequences (see Supplementary notes in Additional file 1). In summary, our results indicate the existence of largely disjoint sets of tissue-specific regulatory sequences located in the neighborhood of their potential target genes. They also confirm an important role for evolutionarily constrained sequences, in that 73 of sequences conserved across mammals exhibit regulatory potential. Finally, consistent with previous studies, they support a role for both promoters and enhancers in determining spatiotemporal patterns of gene expression.Conclusions By analyzing the sequence of promoters of tissue-specific genes, we confirmed that tissue-specific promoters and enhancers share TF binding motifs within the loci of their cognate genes. Moreover, we observed that regulatory information in the promoters of tissue-specific genes is predictive of the enhancers targeting these genes. For 73/79 tissues, we could reliably distinguish between highly and lowly expressed genes based exclusively on the presence or absence of putative motifs (AUC 60 ). Although similar cut-offs PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/27906190 have been recently employed (for example, [18,59]), we recognize that the half of the models exhibiting modest performances (AUC 80 ) might have limited predictive value. It is, however, important to note that the reported AUCs represent the lower bound of the classifier accuracy due to the fact that the strength of the tissue-specificity enhancer signal is expected to vary among the promoters of tissue-specific genes. Promoters containing only weak signals will inherently deflate the classification AUC estimates. To further address the performance of the classifiers at predicting tissuespecific enhancers, we introduced a panel of independent computational and experimental tests, which ultimately validated our analysis. Many of the TFs binding to the motifs that are identified as relevant to each of these models are known to play a fundamental role in the development or maintenance of normal function of the corresponding tissues. We showed that the motifs found in promoter regions can be used to predict enhancers with matching tissue-specificity. The accuracy of our tissue-specific enhancer predictions by promoter-based models is supported by a highly significant association of enhancer predictions with the genes most highly expressed in a given tissue, and by a significant overlap of predictions with experimentally identified tissue-specific enhancers. More importantly, 58 (7/12) of liver enhancer predictions generated by the promoter-based model drove luciferase expression in the liver following hydrodynamic tail vein injection in mice, whereas none of the five negative controls did. Six of the seven.