08 Motif analysis
- Download the data from GEO or ENA
- Basic knowledge of downloading data from GEO or ENA
time Time estimation: 15 minutes
For the motif analysis, you first need to extract the sequences corresponding to the peaks. There are several ways to do this (as usual…). If you work on a UCSC-supported organism, the easiest is to use RSAT fetch-sequences. Here, we will use Bedtools, as we have the genome of interest at our disposal (Escherichia_coli_K12.fasta). However, we have to index the fasta file first to make it easy to access.
Which tool can be used to index the fasta file ?
When you search for modules containing the word fasta you find a tool called SAMtools.FastaIndex that can index a reference sequence in fasta format and this is exactly what we need.
Use this tool to index the E. coli genome and copy the resulting .fai file to the Files tab (in the same folder as the fasta file).
How to extract sequences corresponding to the peaks ?
Use the BEDTools.fastaFromBed module for this.
- The input file is the fasta file of the E. coli genome that you uploaded to the server.
- The bed file is the bed file with the peaks that was generated by MACS (narrowPeak)
Save the resulting .fa file to your computer.
To detect transcription factor motifs, you will use the Regulatory Sequence Analysis Tools. It has a specific teaching server recommended for trainings: http://pedagogix-tagc.univ-mrs.fr/rsat/ You will use the program peak-motifs.
How to find the peak-motifs program In the left menu, click on NGS ChIP-seq and then click on peak-motifs. A new page opens, with a form
The default peak-motifs web form only displays the essential options. There are only two mandatory parameters.
Fill the mandatory options
- The title box, which you will set as FNR Anaerobic .
- The sequences, that you will upload from your computer, by clicking on the button Choose file, and select the file FNR_anaerobic_combined_peaks.fa from your computer.
We will now modify some of the advanced options in order to fine-tune the analysis according to your data set.
Fill the advanced options
- Open the “Reduce peak sequences” title, and make sure the Cut peak sequences: +/- ** option is set to **0 (we wish to analyze our full dataset)
- Open the “Motif Discovery parameters” title, and check the oligomer sizes 6 and 7 (but not 8). Check “Discover over-represented spaced word pairs [dyad-analysis]”
- Under “Compare discovered motifs with databases”, remove “JASPAR core vertebrates” and add RegulonDB prokaryotes (2015_08) as the studied organism is the bacteria E. coli.
Launch the analysis
- You can indicate your email address in order to receive notification of the task submission and completion. This is particularly useful because the full analysis may take some time for very large datasets.
- Click on the button “GO”. As soon as the query has been launched, you should receive an email indicating confirming the task submission, and providing a link to the future result page.
The Web page also displays a link, You can already click on this link. The report will be progressively updated during the processing of the workflow.
keypoints Key points
- SRA ID
Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.