Prepare tagAlign (get ready for peak calling)
PE BAM to SE
BAM to BEDSE:
bedtools bamtobed -bedpe -mate1 -i
- From Bedtools utility
- Bed "name" is from
RNAME
in BAM
tag
by default option: use mapping quality (MAPQ)
- Use for:
- obtaining fragment coordinates and calculate lib complexity
- convert to tagalign
BEDSE To tagalign:
awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\tN\t1000\t%s\n%s\t%s\t%s\tN\t1000\t%s\n",$1,$2,$3,$9,$4,$5,$6,$10}'
- use
awk
cmd to covert each line to 2 lines
- Score in BED is mandetorily set to 1000 (max)
grep -P -v 'chrM' | gzip -nc
- Exclude Mitochondra reads (-v/--invert-match and -P as --perl-regxp (real necessary??)
TN5 shift
shifted_tag = "$prefix.tn5.tagAlign.gz"
zcat $tag | awk -F $'\t' 'BEGIN {OFS = FS}{ if ($6 == "+") {$2 = $2 + 4} else if ($6 == "-") {$3 = $3 - 5} print $0}' | gzip -nc > $shifted_tag
- "+" strand shifted by 4bp; "-" shifted by 5bp; = trim the 9-bp inserts by TN5
- output to
*.tn5.tagAlign.gz
and ready for peak-calling