Set QC for the Atac Seq experiments

The goal of SetQC is to provide a detailed and useful reports for the collaborator to:

  1. Have a sense of data quality
  2. Allow them explore the data for an initial analysis.

Pre-requirement

  1. results are transfereed to the data storage at the end of libQC step.
  2. runFastQC.sh: fastqc for each libs
  3. all scripts are in the ${EPIGEN_FOLDER}/bin/

The procedures

input:

  • give the lib numbers to include in the set
  • Set number: (4 or 4_2)
  • with trim step or not

Steps:

  1. runFastQC
  2. runMultiQC.sh: Run multiQC for the selected libs - get assembled pngs and data
    • Need to tell whether there was a adapter trim step (name contain trim) or not in the process pipeline
    • cp the results to final report folder
  3. Run setBamQC: To be done
  4. Run setPeakQC: To be done
  5. gether

  6. Run genSetTrackJson: set up the json files for the signal tracks

    • cp signal tracks to vm share folder (allow browser api to visit)
    • cat all the json from libQC
    • use R code sub function to remove redundent Genome tracks
  7. Upload the report to VM server for sharing
    • generate
  8. Run setQCreport to generate the html report

setPeakQC

Use merged peak list

  1. idenfity a merged peak set for all the libs
  2. get the average fold-enrichement matrix for all the libs * the merged peak locations
  3. Use the matrix to calculate correlation matrix and PCA

Paired peak overlapping fraction matrix

Essentially function as a venn diagram for the peaks:

  1. for each lib pair, get a overlapped peak set
    • Defined overlapping (variable - one base-pair or threshold in overalpping fraction)

setQCReport

  • Modified default template to allow lg screen browse (need to replace the old template in the lib folder
  • the track iframe can either set to a fix width or update it to the bootsrap responsive but the max width will restrict to the value defined in the template.
  • [ ] use the flexdashboard instead of TOC structure for the report