Lesson 2: Precision Quality Control (QC)
Why quality control matters
Section titled “Why quality control matters”Before downstream analysis, assess raw read quality to detect artifacts that can bias results.
Common issues include:
- Adapter contamination
- Synthetic adapter sequence is retained in reads
- Base-calling errors
- Quality often drops toward read ends in long or noisy runs
Phred quality score
Section titled “Phred quality score”Phred score (Q) expresses the probability (P) that a base call is incorrect.
Q = -10 * log10(P)Reference points:
Q10 -> error probability 1 in 10 -> 90% accuracyQ20 -> error probability 1 in 100 -> 99% accuracyQ30 -> error probability 1 in 1,000 -> 99.9% accuracyTools: FastQC and MultiQC
Section titled “Tools: FastQC and MultiQC”Use two standard tools together:
FastQC: generates per-sample quality reportsMultiQC: aggregates many FastQC outputs into one comparison report
Hands-on workflow
Section titled “Hands-on workflow”1) Activate environment
Section titled “1) Activate environment”conda activate bioinfo2) Run FastQC on training reads
Section titled “2) Run FastQC on training reads”mkdir -p ~/Training/QC_Resultsfastqc ~/Training/short_reads/unpaired/*.fastq.gz -o ~/Training/QC_Results3) Aggregate reports with MultiQC
Section titled “3) Aggregate reports with MultiQC”cd ~/Training/QC_Resultsmultiqc .What to inspect in the MultiQC report
Section titled “What to inspect in the MultiQC report”Focus on these sections:
- Per-base sequence quality
- Adapter content
- Overrepresented sequences
If quality tails drop below about Q20 or adapter content rises, trimming should be included before assembly.