Peak calling
See manual for Call Peaks@Macs2 and paper Model-based Analysis of ChIP-Seq (MACS)
input
It is tagAlign
(a bed format but score represent 1000/alignCounts.
call peak
macs2 callpeak -t xxx.PE2SE.nodup.tn5.tagAlign.gz -f BED \
-n xxx.PE2SE.nodup.tn5.pf" -g "hs" -p 0.01 --nomodel \
--shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits
-t/--treatment
: the only required parameter; file format can be setted by--format
- it supports: SAM,BAM, BED, BEDPE and etc.
- default format is
AUTO
--nomodel
and--extsize 150
tell MACS2 use 150bp as fragment size to pileup sequence- note:
150
is default value for the smoothing window size; but it can be set - shift size is defined by half of the smoothing window size.
- note:
-g "hg"
: use hg genome size (mappable) as background-B --SPMR
: generate pileup s(S)ignal file of 'fragment pileup p(P)er m(M)illion r(R)eads' in bedGraph (B) format- Sort by Col8 in descending order and replace long peak names in Column 4 with Peak_;
-call-summits
: use the big peak region to find different peak summit (with different scores and positions)- Output
*treat_pileup.bdg
and*control_lamda.bdg
bedGraph files
Algorithm
__call_peaks_wo_control
in PeakDetect- first get
scorecalculator
byCallerFromAlignments
class init from MACS2.IO - prepare p-q table (FDR adjusted by total base-pair number)
- then
scorecalculator.call_peaks
based on qval or pval cutoff- peaks are called in each chromosome independently
__chrom_call_peak_using_certain_criteria
and return as class defined inPeakIO
__cal_pscore
,__cal_qscore
,__cal_FE
,__cal_subtraction
are the functions/methods ofCallerFromAlignments
sumit
is called through__close_peak_with_subpeaks
method and using smooth length i.e. fragment size- pscore is calculated by use Poisson distribution (upper) written in C
- peaks are called in each chromosome independently
subcommands
filterdup
: filter redundant reads at each genomic locipredictd
: decide the fragment lengthd
pileup
: generate pileup track by extending reads- 5' to 3' direction for ChIP-seq
-B
to extend in both direction for the cutting site is in the middle of the fragment- Build local bias track: * local bias from the maximum bias from
- surrounding 1kb (
--slocal
) : 1.bdg - 10kb (
--llocal
): 2.bdg - the size of fragement length
d
(d background): 3.bdg - genome background
= number_of_control_reads*fragment_length/genome_size
: gval - max:
macs2 bdgcmp -m max -t 1.bdg -c 2.bdg -o x.bdg
(recursive compare) and the last step is to combine genome wide backgroundmacs2 bdgopt 0i xxx.bdg -m max -p gval -o local_lambda_raw.bdg
- scale to the same seq length
macs2 bdgopt -i local_lambda_raw.bdg -m multiply -p tot_reads_frac -o local_lambda.bdg
- surrounding 1kb (
bdgcmp
:macs2 bdgcmp -t treat_pileup.bdg -c local_lambda.bdg -m qpios -o qval.bdg
or-m ppois -o pval.bdg
for -log10(pval) or -log10(qval- Benjamini-Hochberg procedure),- for each basepair, did local Poisson test (treat signal against local lambda)
bdgopt
bdgpeakcall
bdgbroadcall
(in case of broad mark)
Sort
sort -k 8gr,8gr xxx.PE2SE.nodup.tn5.pf"_peaks.narrowPeak | awk 'BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}' | gzip -nc > xxx.PE2SE.nodup.tn5.pf.narrowPeak.gz
- Sort, rename peak and report to
narrowPeak.gz
file
Comments
- MACS2 published as Model-based analysis but we use only
nomodel
options here.