Prepare tagAlign (get ready for peak calling)

PE BAM to SE

BAM to BEDSE:

bedtools bamtobed -bedpe -mate1 -i 
  • From Bedtools utility
    • Bed "name" is from RNAME in BAM
    • tag by default option: use mapping quality (MAPQ)
  • Use for:
    • obtaining fragment coordinates and calculate lib complexity
    • convert to tagalign

BEDSE To tagalign:

awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\tN\t1000\t%s\n%s\t%s\t%s\tN\t1000\t%s\n",$1,$2,$3,$9,$4,$5,$6,$10}'
  • use awk cmd to covert each line to 2 lines
  • Score in BED is mandetorily set to 1000 (max)
grep -P -v 'chrM' | gzip -nc
  • Exclude Mitochondra reads (-v/--invert-match and -P as --perl-regxp (real necessary??)

TN5 shift

shifted_tag = "$prefix.tn5.tagAlign.gz"
zcat $tag | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' | gzip -nc > $shifted_tag
  • "+" strand shifted by 4bp; "-" shifted by 5bp; = trim the 9-bp inserts by TN5
  • output to *.tn5.tagAlign.gz and ready for peak-calling