Peak calling

See manual for Call Peaks@Macs2 and paper Model-based Analysis of ChIP-Seq (MACS)

input

It is tagAlign (a bed format but score represent 1000/alignCounts.

call peak

macs2 callpeak -t xxx.PE2SE.nodup.tn5.tagAlign.gz -f BED \
-n xxx.PE2SE.nodup.tn5.pf" -g "hs" -p 0.01 --nomodel \ 
--shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits

-t/--treatment: the only required parameter; file format can be setted by --format
- it supports: SAM,BAM, BED, BEDPE and etc.
- default format is AUTO
--nomodel and --extsize 150 tell MACS2 use 150bp as fragment size to pileup sequence
- note: 150 is default value for the smoothing window size; but it can be set
- shift size is defined by half of the smoothing window size.
-g "hg": use hg genome size (mappable) as background
-B --SPMR: generate pileup s(S)ignal file of 'fragment pileup p(P)er m(M)illion r(R)eads' in bedGraph (B) format
Sort by Col8 in descending order and replace long peak names in Column 4 with Peak_;
-call-summits: use the big peak region to find different peak summit (with different scores and positions)
Output *treat_pileup.bdg and *control_lamda.bdg bedGraph files

Algorithm

__call_peaks_wo_control in PeakDetect
first get scorecalculator by CallerFromAlignments class init from MACS2.IO
prepare p-q table (FDR adjusted by total base-pair number)
then scorecalculator.call_peaks based on qval or pval cutoff
- peaks are called in each chromosome independently __chrom_call_peak_using_certain_criteria and return as class defined in PeakIO
- __cal_pscore, __cal_qscore,__cal_FE,__cal_subtraction are the functions/methods of CallerFromAlignments
- sumit is called through __close_peak_with_subpeaks method and using smooth length i.e. fragment size
- pscore is calculated by use Poisson distribution (upper) written in C

subcommands

filterdup: filter redundant reads at each genomic loci
predictd: decide the fragment length d
pileup: generate pileup track by extending reads
- 5' to 3' direction for ChIP-seq
- -B to extend in both direction for the cutting site is in the middle of the fragment
- Build local bias track: * local bias from the maximum bias from
  - surrounding 1kb (--slocal) : 1.bdg
  - 10kb (--llocal): 2.bdg
  - the size of fragement length d (d background): 3.bdg
  - genome background = number_of_control_reads*fragment_length/genome_size: gval
  - max: macs2 bdgcmp -m max -t 1.bdg -c 2.bdg -o x.bdg (recursive compare) and the last step is to combine genome wide background macs2 bdgopt 0i xxx.bdg -m max -p gval -o local_lambda_raw.bdg
  - scale to the same seq length macs2 bdgopt -i local_lambda_raw.bdg -m multiply -p tot_reads_frac -o local_lambda.bdg
bdgcmp:
- macs2 bdgcmp -t treat_pileup.bdg -c local_lambda.bdg -m qpios -o qval.bdg or -m ppois -o pval.bdg for -log10(pval) or -log10(qval- Benjamini-Hochberg procedure),
- for each basepair, did local Poisson test (treat signal against local lambda)
bdgopt
bdgpeakcall
bdgbroadcall (in case of broad mark)

Sort

sort -k 8gr,8gr xxx.PE2SE.nodup.tn5.pf"_peaks.narrowPeak | awk 'BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}' | gzip -nc > xxx.PE2SE.nodup.tn5.pf.narrowPeak.gz

Sort, rename peak and report to narrowPeak.gz file

Comments

MACS2 published as Model-based analysis but we use only nomodel options here.