Compare BAM, BEDSE and tagAlign

Worth to Note that: BAM is 1-based coordinates; BED is 0-based and half-open (?)

BAM

cmd:

#samtools view xxx.trim.PE2SE.bam | head -n1 
samtools view Mouse_brain_Islet_R1.trim.PE2SE.bam chr14:22142572-22142725| grep "7001113:845:HYH22BCXY:1:1101:10000:44312"

Output:

7001113:845:HYH22BCXY:1:1101:10000:44312        163     chr14   22142573        44      50M     =       22142676        153     GTCTTTTCCTTGGAAGGAAAAGATGTAATAATCTCAGTTTTGGATAAAAT  \
    DDDDDIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIGIIIIHIIIII      AS:i:100        XS:i:42 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:50 YS:i:100        YT:Z:CP                          
7001113:845:HYH22BCXY:1:1101:10000:44312        83      chr14   22142676        44      50M     =       22142573        -153    CCCAGCATCTACATTACAGACTTCAATGAAGGAAGTAAAAATATCTCAAT  \
    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDDDD      AS:i:100        XS:i:44 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:50 YS:i:100        YT:Z:CP

Actually, our example starts from col3.

Col Field Type Brief Description
1 QNAME String Query template NAME
2 FLAG Int bitwise FLAG
3 RNAME String References sequence NAME
4 POS Int 1- based leftmost mapping POSition
5 MAPQ Int MAPping Quality
6 CIGAR String CIGAR String
7 RNEXT String Ref. name of the mate/next read
8 PNEXT Int Position of the mate/next read
9 TLEN Int observed Template LENgth
10 SEQ String segment SEQuence
11 QUAL String ASCII of Phred-scaled base QUALity+33

BEDSE

cmd:

zcat xxx.trim.PE2SE.nodup.bedpe.gz | head -n1

Output:

chr14   22142675        22142725        chr14   22142572        22142622        7001113:845:HYH22BCXY:1:1101:10000:44312        44      -       +

tagAlign

cmd:

zcat xxx.trim.PE2SE.nodup.tagAlign.gz | head -n1

Output:

chr14   22142675        22142725        N       1000    - 
chr14   22142572        22142622        N       1000    +
  • col 5 is the score = 1000/alignemntCount

tagAlign - TN5

cmd:

zcat xxx.trim.PE2SE.nodup.tagAlign.gz | head -n2

Output:

chr14   22142675        22142720        N       1000    - 
chr14   22142576        22142622        N       1000    +