MEALR combinatorial regulation analysis
Extract TRANSFAC(R) PWMs from combinatorial regulation analysis
This tool extracts TRANSFAC® PWMs from a result table generated by the MEALR combinatorial regulation analysis. The PWMs represent transcription factor binding specificities that constitute the combinatorial module predicted by the MEALR model.
|Input MEALR search table
|Input table, a MEALR search result table
|Model accuracy cutoff
|Logistic regression coefficient cutoff
The output contains the TRANSFAC® PWMs extracted from MEALR models according to specified cutoffs. This table can further be applied in several analyses, e.g. to extract corresponding transcription factors using the tool or to create a profile for binding site predictions ( Create profile from site model table) with MATCHTM.
|Output table column
|Highest match probability of models from which PWM was extracted
|Highest accuracy of a model from which PWM was extracted
|Highest importance of PWM in extracted models
|Average match probability of models from which PWM was extracted
|Average accuracy of a model from which PWM was extracted
|Average importance of PWM in extracted models
|Lowest match probability of models from which PWM was extracted
|Lowest accuracy of a model from which PWM was extracted
|Lowest importance of PWM in extracted models
|Cell sources of extracted models
|Tissue sources of extracted models
|Transcription factors targeted in experiments of extracted models
|Ids of models from which PWM was extracted
Example analysis Open the tool in the user interface.
TRANSFAC(R) MEALR combinatorial regulation analysis
This analysis applies combinatorial regulatory models (CRMs) based on the MEALR affinity score  to classify or scan sequences for occurrences of combinations of transcription factor binding sites represented by TRANSFAC® PWMs. The models are taken from the MEALR library whose training data originate from the TRANSFAC® collection of high-throughput sequencing experiments.
|Sequence track or collection to search
|Sequence source associated with the sequence track. Either a custom or genomic sequence source
|Focus on models from selected cell sources
|Focus on models from selected tissue sources
|Classify entire sequence instead of scanning for hits
|Scan mode, best hit or cutoff based
|Step size for scanning mode
|Cutoff for the probability that a sequence matches the model
|Model accuracy cutoff
|Select models with test set accuracy equal or better than the accuracy cutoff prior to search
|Output folder for analysis results
Classification and scan modes
The Classification mode evaluates input sequences as a whole, whereas the scan mode analyzes sequence windows separated by the given step size (sliding window). In scan mode, the Best hit method reports the best scoring sequence window disregarding a cutoff and the Cutoff method reports the best non-overlapping windows satisfying the specified cutoff.
The MEALR search applies sequence length limits. The minimal sequence length for scan and classification modes is 50 base characters. The classification mode supports sequences up to 5000 base characters. Ideally, input sequences for classification mode should have lengths corresponding to genomic regions typically observed in ChIP-seq studies like 500 - 1000 bases, whereas input sequences for scanning should not be too short, e.g. ≥300 bases. We recommend to take differences between model length and sequence length, which are reported in the output table, into account in the assessment of the reliability of predictions.
Cell and tissue sources can be selected to focus on a subset of CRMs which have been trained with data from respective sources. Please note that selection of multiple cells and/or tissues gathers all CRMs that are associated with any one of selected sources.
The output folder encompasses a table and sequence track with information about model hits. The output table contains sequence start and end points of hits, model ids, match probabilities as well as other values as described below. For input sequences derived from genomic regions (instead of imported as custom sequences) the table includes in addition a sequence id generated for a region as well as the genomic sequence id, start and end coordinates.
|Output table column
|Sequence id of custom or genomic sequence
|Interval site name
|Sequence id constructed for genomic interval
|CRM region start (one-based)
|CRM region end (one-based)
|MEALR model id
|Gene symbol of transcription factor analyzed in source experiment generating training data
|Cell source of training data
|Tissue source of training data
|Test set accuracy of MEALR model
|Type of MEALR model (LR: logistic regression, WLR: LR with weighting of CRM region positions)
|Length of CRM region
|Length of analyzed sequence
Open the tool in the user interface.
Specify sequence(s) to analyze. This should be a track item with custom or genomic sequences (Input example). A sequence source is suggested automatically.
Select cell and tissue sources for filtering. There are over 400 cell types and more than 50 tissues to choose from. Please note that multiple selection causes all models from the library to be considered that are associated with any one of the selected cells or tissues.
Choose Classification mode if scan mode is not desired
If scan mode, specify a step size and select Best hit or Cutoff mode
If Cutoff mode, specify a cutoff for model hits
Specify a model accuracy cutoff
Specify an output folder for results. The folder can already exist or be newly created by the workflow (Example result folder).