Peak calling

See manual for Call Peaks@Macs2 and paper Model-based Analysis of ChIP-Seq (MACS)

input

It is tagAlign (a bed format but score represent 1000/alignCounts.

call peak

macs2 callpeak -t xxx.PE2SE.nodup.tn5.tagAlign.gz -f BED \
-n xxx.PE2SE.nodup.tn5.pf" -g "hs" -p 0.01 --nomodel \ 
--shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits
  • -t/--treatment: the only required parameter; file format can be setted by --format
    • it supports: SAM,BAM, BED, BEDPE and etc.
    • default format is AUTO
  • --nomodel and --extsize 150 tell MACS2 use 150bp as fragment size to pileup sequence
    • note: 150 is default value for the smoothing window size; but it can be set
    • shift size is defined by half of the smoothing window size.
  • -g "hg": use hg genome size (mappable) as background
  • -B --SPMR: generate pileup s(S)ignal file of 'fragment pileup p(P)er m(M)illion r(R)eads' in bedGraph (B) format
  • Sort by Col8 in descending order and replace long peak names in Column 4 with Peak_;
  • -call-summits: use the big peak region to find different peak summit (with different scores and positions)
  • Output *treat_pileup.bdg and *control_lamda.bdg bedGraph files

Algorithm

  • __call_peaks_wo_control in PeakDetect
  • first get scorecalculator by CallerFromAlignments class init from MACS2.IO
  • prepare p-q table (FDR adjusted by total base-pair number)
  • then scorecalculator.call_peaks based on qval or pval cutoff
    • peaks are called in each chromosome independently __chrom_call_peak_using_certain_criteria and return as class defined in PeakIO
    • __cal_pscore, __cal_qscore,__cal_FE,__cal_subtraction are the functions/methods of CallerFromAlignments
    • sumit is called through __close_peak_with_subpeaks method and using smooth length i.e. fragment size
    • pscore is calculated by use Poisson distribution (upper) written in C

subcommands

  • filterdup: filter redundant reads at each genomic loci
  • predictd: decide the fragment length d
  • pileup: generate pileup track by extending reads
    • 5' to 3' direction for ChIP-seq
    • -B to extend in both direction for the cutting site is in the middle of the fragment
    • Build local bias track: * local bias from the maximum bias from
      • surrounding 1kb (--slocal) : 1.bdg
      • 10kb (--llocal): 2.bdg
      • the size of fragement length d (d background): 3.bdg
      • genome background = number_of_control_reads*fragment_length/genome_size: gval
      • max: macs2 bdgcmp -m max -t 1.bdg -c 2.bdg -o x.bdg (recursive compare) and the last step is to combine genome wide background macs2 bdgopt 0i xxx.bdg -m max -p gval -o local_lambda_raw.bdg
      • scale to the same seq length macs2 bdgopt -i local_lambda_raw.bdg -m multiply -p tot_reads_frac -o local_lambda.bdg
  • bdgcmp:
    • macs2 bdgcmp -t treat_pileup.bdg -c local_lambda.bdg -m qpios -o qval.bdg or -m ppois -o pval.bdg for -log10(pval) or -log10(qval- Benjamini-Hochberg procedure),
    • for each basepair, did local Poisson test (treat signal against local lambda)
  • bdgopt
  • bdgpeakcall
  • bdgbroadcall (in case of broad mark)

Sort

sort -k 8gr,8gr xxx.PE2SE.nodup.tn5.pf"_peaks.narrowPeak | awk 'BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}' | gzip -nc > xxx.PE2SE.nodup.tn5.pf.narrowPeak.gz
  • Sort, rename peak and report to narrowPeak.gz file

Comments

  • MACS2 published as Model-based analysis but we use only nomodel options here.