13. Motifs in DNase I peaksΒΆ

Exercise 13.1

The exercise for this week is to compare the relation between transcription factor motif presence and DNase I signal in different cell types.

The result of your code should be a figure like the following (the values here are not real, but only shown for illustrative purposes):

Figure 1: Example heatmap for this exercise.

This heatmap shows, for every cell type and motif combination, the difference of the mean signal for regions with a motif compared to regions without a motif.

Data

In the directory /vol/cursus/CFB/dnase_motifs/ you will find two files:

  • DNase_table.tsv

  • motifs.txt

In addition, the human genome (version hg38) is present here:

  • /vol/cursus/CFB/hg38/hg38.fa

The file DNase_table.tsv looks like this:

Figure 2: Contents of DNase_table.tsv.

This tab-separated file contains DNase I signals for six hematopoietic cell types in 10,000 genomic regions. The DNase I reads were counted, log2 transformed and normalized by scaling.

The file motifs.txt contains the name and consensus sequence for seven motifs.

Retrieving sequences from a genome FASTA file.

You can use the pyfaidx module to retrieve specific sequences from a FASTA file based on genomic coordinates. See the documentation of the module here.

First, you create a FASTA object.

from pyfaidx import Fasta
genome = Fasta("my_genome.fa")

Then, you can retrieve sequences in a dictionary-like approach.

my_seq = genome["chr2"][5000000:5000020]
print(my_seq.seq)


TCATGACCATAGAGAACAGA

Approach

Think about your approach before you start writing code! You will have to:

  • scan sequences with the motif consensus sequences;

  • determine, for every motif, which sequences have a match to the motif;

  • calculate the mean signal, depending on the motif occurrence;

  • calculate the difference in mean signal of sequences with a motif match and without a motif match.

Questions

  • Make a correlation plot of the DNase signal in these hematopoietic cel types? Is this what you expect?

  • Which motifs are specific for certain cell types?

  • Does this agree with the literature?