Peak calling
See manual for Call Peaks@Macs2 and paper Model-based Analysis of ChIP-Seq (MACS)
input
It is tagAlign (a bed format but score represent 1000/alignCounts.
call peak
macs2 callpeak -t xxx.PE2SE.nodup.tn5.tagAlign.gz -f BED \
-n xxx.PE2SE.nodup.tn5.pf" -g "hs" -p 0.01 --nomodel \
--shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits
-t/--treatment: the only required parameter; file format can be setted by--format- it supports: SAM,BAM, BED, BEDPE and etc.
- default format is
AUTO
--nomodeland--extsize 150tell MACS2 use 150bp as fragment size to pileup sequence- note:
150is default value for the smoothing window size; but it can be set - shift size is defined by half of the smoothing window size.
- note:
-g "hg": use hg genome size (mappable) as background-B --SPMR: generate pileup s(S)ignal file of 'fragment pileup p(P)er m(M)illion r(R)eads' in bedGraph (B) format- Sort by Col8 in descending order and replace long peak names in Column 4 with Peak_;
-call-summits: use the big peak region to find different peak summit (with different scores and positions)- Output
*treat_pileup.bdgand*control_lamda.bdgbedGraph files
Algorithm
__call_peaks_wo_controlin PeakDetect- first get
scorecalculatorbyCallerFromAlignmentsclass init from MACS2.IO - prepare p-q table (FDR adjusted by total base-pair number)
- then
scorecalculator.call_peaksbased on qval or pval cutoff- peaks are called in each chromosome independently
__chrom_call_peak_using_certain_criteriaand return as class defined inPeakIO __cal_pscore,__cal_qscore,__cal_FE,__cal_subtractionare the functions/methods ofCallerFromAlignmentssumitis called through__close_peak_with_subpeaksmethod and using smooth length i.e. fragment size- pscore is calculated by use Poisson distribution (upper) written in C
- peaks are called in each chromosome independently
subcommands
filterdup: filter redundant reads at each genomic locipredictd: decide the fragment lengthdpileup: generate pileup track by extending reads- 5' to 3' direction for ChIP-seq
-Bto extend in both direction for the cutting site is in the middle of the fragment- Build local bias track: * local bias from the maximum bias from
- surrounding 1kb (
--slocal) : 1.bdg - 10kb (
--llocal): 2.bdg - the size of fragement length
d(d background): 3.bdg - genome background
= number_of_control_reads*fragment_length/genome_size: gval - max:
macs2 bdgcmp -m max -t 1.bdg -c 2.bdg -o x.bdg(recursive compare) and the last step is to combine genome wide backgroundmacs2 bdgopt 0i xxx.bdg -m max -p gval -o local_lambda_raw.bdg - scale to the same seq length
macs2 bdgopt -i local_lambda_raw.bdg -m multiply -p tot_reads_frac -o local_lambda.bdg
- surrounding 1kb (
bdgcmp:macs2 bdgcmp -t treat_pileup.bdg -c local_lambda.bdg -m qpios -o qval.bdgor-m ppois -o pval.bdgfor -log10(pval) or -log10(qval- Benjamini-Hochberg procedure),- for each basepair, did local Poisson test (treat signal against local lambda)
bdgoptbdgpeakcallbdgbroadcall(in case of broad mark)
Sort
sort -k 8gr,8gr xxx.PE2SE.nodup.tn5.pf"_peaks.narrowPeak | awk 'BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}' | gzip -nc > xxx.PE2SE.nodup.tn5.pf.narrowPeak.gz
- Sort, rename peak and report to
narrowPeak.gzfile
Comments
- MACS2 published as Model-based analysis but we use only
nomodeloptions here.