Filter & Dedup
Filter bam
samtools view -F 1804 -f 2 -u -q 30 xxx.PE2SE.bam | sambamba sort -n /dev/stdin -o /output_dir/xxx.PE2SE.dupmark.bam
Filtered by:
- Improper mapping marker (1804): ummapped, not primary, failing platform, duplicates
- poor mapping score (<30) including multi-mapped reads
- unmmated reads - [f -2] output fwd and rev. both mapped pairs
output:
- [u]ncompressed bam
- Sort the bam by name (-n) and prepair for the deduplicating step
samtools fixmate -r xxx.PE2SE.dupmark.bam (tmp) xxx.PE2SE.dupmark.bam.fixmate.bam (tmp)
- Fill in mate coordinate. ISIZE (insert size) and mate related flags from the name-sorted bam and remove secondary and ummapped reads (-r)
- Fixmate try to compute serveral attributes, for example, column 7 and 8, tags such as MQ, Q2 and R2 and etc.
samtools view -F 1804 -f 2 -u xxx.PE2SE.dupmark.bam.fixmate.bam | sambamba sort /dev/stdin -o xxx.PE2SE.filt.bam
- Remove reads with improper mapping marker again (why?)
- sorted by index
Dedup
- Use picard to mark duplicates & QC
- algorithm for markduplicate is here
- output:
xxx.PE2SE.filt.dupmark.bam
- Then remove duplicates *
- Output the final bam file:
PE2SE.nodup.bam