Solutions — Module 6 Exercises
Note for instructors: These solutions demonstrate strong reasoning, not the only valid approach. Students may arrive at different but equally correct conclusions. Prioritise discussion of reasoning quality over answer matching. Where a student’s approach differs from these solutions but is logically sound, treat it as an opportunity for a richer conversation.
Exercise 1 Solutions: Systematic Decomposition of a New Problem
Section titled “Exercise 1 Solutions: Systematic Decomposition of a New Problem”Part A: Five Questions
Section titled “Part A: Five Questions”Q1 — Final biological output:
A strong answer includes:
- A table or text report identifying the virus species (or species) present in each sample
- One or more phylogenetic trees (one per viral component if bipartite genome is found)
- A comparison table showing which virus/strain was found on each farm and whether there is cross-farm sharing
- Supporting files: assembled contigs (FASTA), BLAST summary table, alignment and tree files
A weak answer says only “a phylogenetic tree” without specifying what question the tree answers, or what format the result takes.
Q2 — What data do I start with?
Key points a strong answer includes:
- 6 FASTQ files, Nanopore technology (long reads, error rate ~5–15% depending on chemistry)
- 2 samples per farm = opportunity to check within-farm reproducibility
- No reference virus sequences from these farms → de novo approach required
- Tomato reference genome available → host filtering is feasible
- Unknown: basecalling model used (affects polishing tool choice), library preparation details, whether samples are mixed co-infections
Q3 — Logical gap:
Raw Nanopore FASTQ files (mixed: host + virus + other) ↓ QC and host filteringViral-enriched reads (clean, reduced) ↓ De novo assemblyAssembled contigs (of unknown identity) ↓ BLAST identificationAnnotated viral contigs (labelled by species) ↓ Alignment + phylogeneticsPhylogenetic tree (evolutionary context) ↓ Comparison across farmsConclusion: same/different virus per farmQ4 — Known failure modes (minimum four):
| Failure mode | Step | Detection |
|---|---|---|
| Tomato chloroplast escapes host filtering | Host removal | Three-GC-peak QC; extra contig in assembly BLASTs to chloroplast |
| Low viral abundance → insufficient coverage | Coverage estimation | Coverage formula < 30× after host filtering |
| Co-infection with two viruses | Assembly | Two sets of contigs of the same expected size; BLAST gives different top hits |
| Assembly collapse (two strains in one contig) | Assembly | BLAST identity unexpectedly low to all references; uneven depth |
| Reference genome too divergent from local tomato → poor host removal | Host removal | Many host-like reads in assembled contigs |
Q5 — “Good enough” thresholds:
| Phase | Threshold |
|---|---|
| Read QC | Mean quality ≥ Q8 (Nanopore baseline); if mean < Q7, flag for re-basecalling |
| Host filtering | ≥ 50% of reads mapped to host (for a field sample — less would be surprising) |
| Coverage | ≥ 30× after host removal; flag and consider combining samples if below |
| Assembly | ≥ 90% of expected genome size covered; no uncovered regions in read-back mapping |
| BLAST identification | Top hit e-value < 1e-10; identity ≥ 65% for genus-level assignment |
| Phylogenetic tree | Bootstrap support ≥ 70 for key branches |
Part B: Decomposition Tree
Section titled “Part B: Decomposition Tree”A strong tree for this problem:
Characterise tomato virus across 3 farms│├── Phase 1: Data quality assessment (×6 samples)│ ├── 1a. NanoStat per sample (read count, N50, mean quality)│ └── 1b. Flag samples with mean quality < Q8 or read count < 1,000│├── Phase 2: Host read removal (×6 samples)│ ├── 2a. Download S. lycopersicum reference (including chloroplast genome)│ ├── 2b. Map with minimap2 -ax map-ont│ ├── 2c. Extract unmapped reads (samtools view -f 4)│ ├── 2d. Quantify: % mapped to host (expected: 70–95%)│ └── 2e. IF >99% mapped: investigate (possible very low viral load or wrong host reference)│├── Phase 3: Coverage estimation (×6 samples)│ ├── 3a. Calculate coverage (reads × avg_length / expected_genome_size)│ │ → Use 6,000 bp as estimate (bipartite genome, ~3 kb each component)│ ├── 3b. IF coverage < 30×: combine with second farm sample before assembly│ └── 3c. Document coverage decision│├── Phase 4: De novo assembly (×6 samples or combined pairs)│ ├── 4a. Flye --nano-raw --genome-size 6k│ ├── 4b. Evaluate: QUAST (N50, total length, contig count, circularity)│ ├── 4c. Polish: Medaka (match model to basecalling chemistry)│ └── 4d. IF assembly fails: check filtered reads file; check genome size estimate│├── Phase 5: Contig identification (×6 assemblies)│ ├── 5a. BLAST all contigs vs. NCBI nt (outfmt 6, evalue 1e-5)│ ├── 5b. Classify contigs: viral, host chloroplast, bacterial, unknown│ ├── 5c. Retain only viral contigs│ └── 5d. IF multiple viral species detected: report as co-infection│├── Phase 6: Annotation│ ├── 6a. Compare to GenBank entries (BLASTn + BLASTx)│ └── 6b. Predict and annotate ORFs│├── Phase 7: Phylogenetic analysis (per viral species found)│ ├── 7a. Download reference sequences (same genus/family, from NCBI)│ ├── 7b. Align: MAFFT --auto│ ├── 7c. Trim: trimAl -automated1│ ├── 7d. Model selection + tree inference: IQ-TREE2 -m TEST -bb 1000│ └── 7e. Visualise: FigTree or iTOL│└── Phase 8: Cross-farm comparison ├── 8a. Extract assembled sequences from all 6 samples ├── 8b. Pairwise BLAST or whole-genome alignment (Mauve) └── 8c. Report: same strain / related strains / different species per farmPart C: Relevant Patterns
Section titled “Part C: Relevant Patterns”Pattern 1: Mixed sample with bimodal GC — very likely in field-collected plant material. Presentation: GC plot in NanoStat or FastQC shows two peaks. Response: proceed with host filtering; do not be alarmed.
Pattern 2: Chloroplast escape from host filtering — known issue when host reference is nuclear-only. Presentation: unexpected extra contig in assembly, BLASTs to Solanaceae chloroplast. Response: include chloroplast genome in host reference; remove the contig.
Pattern 3: Co-infection — multiple viruses common in symptomatic plants. Presentation: more contigs than expected; BLAST gives different top hits for different contigs. Response: separate the viruses and analyse each independently.
Part D: Algorithm Choices with SACRED justification
Section titled “Part D: Algorithm Choices with SACRED justification”Host removal:
- Tool: minimap2 + samtools
- SACRED: Speed (fast, runs in minutes); Data suitability (map-ont mode designed for Nanopore); Ease (standard, well-documented)
- Acceptable alternative: Kraken2 (better if reference is incomplete, but requires large database)
De novo assembly:
- Tool: Flye (—nano-raw)
- SACRED: Accuracy (handles high error rate via read graph approach); Robustness (designed for messy field data); Data suitability (Nanopore-specific mode)
Viral contig identification:
- Tool: BLASTn against NCBI nt
- SACRED: Accuracy (exhaustive database coverage); Speed (acceptable for small contig count)
Phylogenetic analysis:
- Tool: IQ-TREE2 with model testing (-m TEST)
- SACRED: Accuracy (maximum likelihood with bootstrap; best method for < 100 sequences); Ease (minimal configuration for standard use case)
Cross-farm comparison:
- Tool: BLASTn (pairwise) or whole-genome alignment (Mauve) + average nucleotide identity (ANI)
- Justification: You are asking about similarity between sequences, which is an alignment/similarity problem. BLAST is fast; ANI is more rigorous if you want species-level discrimination.
Exercise 2 Solutions: A GWAS Gone Wrong
Section titled “Exercise 2 Solutions: A GWAS Gone Wrong”Part A: Diagnosis
Section titled “Part A: Diagnosis”Most likely cause of λ = 1.89:
Decomposed causes:
- Population stratification — supported by the PCA finding (PC1 separates Kenya from Ghana)
- Cryptic relatedness — possible, but relatedness would not cause PC1 to split perfectly by country
- Batch effects — could contribute if genotyping was done in country-specific batches
- True signal — eliminated: λ = 1.89 is too high and too uniform for true signal, which concentrates at significant SNPs
Most likely cause: population stratification. The PCA shows that Kenya and Ghana form completely separate clusters. These populations have different allele frequencies at hundreds of thousands of SNPs across the genome. The association analysis, naively treating both groups as “cases and controls,” is detecting country of origin at every SNP, not disease association.
The student’s counterargument: It is incorrect. Malaria severity genetics differ between populations, but not so dramatically or uniformly that every SNP in the genome would show association. The key diagnostic to refute this claim: LD score regression (LDSC). LDSC can partition inflation into stratification and genuine polygenicity. If the intercept is >> 1.0, the inflation is confounding, not signal.
Abstraction failure: The analysis abstracted “cohort” as a single ancestrally homogeneous population. In reality, the cohort was a stratified mixture of two genetically distinct populations with different environmental exposures (different malaria transmission settings in Kenya vs. Ghana), different case definitions, and different control recruitment strategies. A correct abstraction would model the two populations separately or explicitly include ancestry as a covariate.
Part B: Corrective Strategies
Section titled “Part B: Corrective Strategies”Strategy 1: Include top PCs as covariates
plink --bfile genotypes_qc \ --logistic \ --covar pca_results.eigenvec \ --covar-number 1-10 \ --out gwas_pc_corrected| SACRED factor | Score |
|---|---|
| Speed | High — just adds covariates to regression |
| Accuracy | Moderate — corrects for gradual stratification; may not fully correct extreme stratification |
| Cost | Low |
| Robustness | Moderate — assumes linear relationship between PCs and phenotype |
| Ease | High |
| Data suitability | Good for mild stratification; questionable for two completely separate populations |
Strategy 2: Stratified analysis + meta-analysis
Run the GWAS separately in Kenya and Ghana, then meta-analyse with a fixed-effects model (METAL or GWAMA).
| SACRED factor | Score |
|---|---|
| Speed | Moderate — two separate runs |
| Accuracy | High — removes confounding completely by population |
| Cost | Low |
| Robustness | High — does not rely on covariate assumptions |
| Ease | Moderate — requires meta-analysis software |
| Data suitability | Excellent for two clearly separated populations |
Strategy 3: Genomic Control (GC correction)
Divide all test statistics by the inflation factor λ before computing p-values.
# In R, using a summary statistics file:summary_stats$chi2_corrected <- summary_stats$chi2 / 1.89summary_stats$p_corrected <- pchisq(summary_stats$chi2_corrected, df=1, lower.tail=FALSE)| SACRED factor | Score |
|---|---|
| Speed | Very high — post-hoc correction |
| Accuracy | Lower — blunt correction that doesn’t model stratification |
| Cost | Very low |
| Robustness | Low — overcorrects in polygenic traits; undercorrects in extreme stratification |
| Ease | Very high |
| Data suitability | Only for mild inflation (λ < 1.1) |
Recommendation for this scenario: Strategy 2 (stratified meta-analysis). The stratification is so extreme (two completely non-overlapping clusters in PCA) that covariate correction (Strategy 1) may be insufficient. GC correction (Strategy 3) would be inappropriate here — it is only suitable for mild inflation.
Part C: Open Reflection
Section titled “Part C: Open Reflection”The proposal to use only one country’s data is scientifically suboptimal:
- Loss of sample size: Halving to n=1,000 reduces statistical power by ~30% (power scales with √n in most models). This may eliminate the ability to detect true associations.
- Loss of generalisability: Results from Kenya-only would apply only to the Kenyan population. Malaria genetics are population-specific; a Kenya-only result may not replicate in Ghana.
- Misses cross-population heterogeneity: Some variants may be associated in one population but not another — that heterogeneity is itself scientifically interesting.
The correct solution is Strategy 2: analyse populations separately, then meta-analyse. This is the standard approach in multi-ancestry GWAS.
Computational thinking principle: This is an abstraction problem. The student wants to simplify by using only one population — but the correct simplification is to abstract the analysis separately per population, not to discard half the data.
Exercise 3 Solutions: Pseudocode and Algorithm Design
Section titled “Exercise 3 Solutions: Pseudocode and Algorithm Design”Part A: Pseudocode
Section titled “Part A: Pseudocode”A strong pseudocode submission:
SCRIPT: viral_filter_and_assemble.shINPUTS: FASTQ_FILE, HOST_GENOME, GENOME_SIZE (default 5000)OUTPUTS: results/ directory structure
FUNCTION main(fastq_file, host_genome): CREATE directories: results/qc/, results/mapping/, results/assembly/
STEP 1: Read quality assessment RUN NanoStat on fastq_file SAVE report to results/qc/nanostat_report.txt PRINT "QC complete"
STEP 2: Host read mapping RUN minimap2 -ax map-ont host_genome fastq_file → SAM CONVERT SAM to sorted BAM (samtools sort) INDEX BAM (samtools index) SAVE BAM to results/mapping/mapped_to_host.bam
STEP 3: Extract unmapped reads RUN samtools view -f 4 -F 256 mapped_to_host.bam → FASTQ SAVE to results/mapping/viral_candidate_reads.fastq
STEP 4: Coverage estimation read_count = COUNT_LINES(viral_candidate_reads.fastq) / 4 avg_length = ESTIMATE_AVERAGE_LENGTH(viral_candidate_reads.fastq) coverage = (read_count × avg_length) / GENOME_SIZE PRINT "Estimated coverage: " + coverage + "×"
IF coverage < 30: PRINT WARNING "Coverage below 30×. Consider combining samples." EXIT with code 1 # or flag, depending on design ELSE: PRINT "Coverage adequate. Proceeding to assembly."
STEP 5: De novo assembly RUN flye --nano-raw viral_candidate_reads.fastq --genome-size GENOME_SIZE --out-dir results/assembly/ --threads 4 PRINT "Assembly complete"
STEP 6: Mapping QC report RUN samtools flagstat results/mapping/mapped_to_host.bam SAVE to results/qc/flagstat_report.txt
PRINT "Pipeline complete. Results in results/"
CALL main(FASTQ_FILE, HOST_GENOME)Part B: Bash Script
Section titled “Part B: Bash Script”#!/usr/bin/env bash# Usage: bash viral_filter_and_assemble.sh <reads.fastq> <host_genome.fa> [genome_size_bp]## set -euo pipefail:# -e: exit immediately if any command fails (non-zero exit code)# -u: treat undefined variables as errors (prevents silent typos)# -o pipefail: if any command in a pipe fails, the whole pipe fails# Together: prevents silent failures from cascading through the pipelineset -euo pipefail
# --- Arguments ---READS="${1:?Usage: $0 <reads.fastq> <host_genome.fa> [genome_size_bp]}"HOST_GENOME="${2:?Must provide host genome FASTA}"GENOME_SIZE="${3:-5000}" # Default 5000 bp if not providedTHREADS=4
# --- Directory setup ---mkdir -p results/qc results/mapping results/assembly
echo "=== Step 1: Read quality assessment ==="NanoStat --fastq "$READS" --outdir results/qc/ --name "raw_reads" \ > results/qc/nanostat_stdout.txt 2>&1echo " NanoStat complete. Report in results/qc/"
echo "=== Step 2: Map reads to host genome ==="minimap2 -ax map-ont -t "$THREADS" "$HOST_GENOME" "$READS" \ | samtools sort -o results/mapping/mapped_to_host.bam -samtools index results/mapping/mapped_to_host.bamecho " Mapping complete."
echo "=== Step 3: Extract non-host (unmapped) reads ==="samtools view -f 4 -F 256 results/mapping/mapped_to_host.bam \ | samtools fastq - > results/mapping/viral_candidate_reads.fastqecho " Non-host reads extracted."
echo "=== Step 4: Estimate coverage ==="LINE_COUNT=$(wc -l < results/mapping/viral_candidate_reads.fastq)READ_COUNT=$(( LINE_COUNT / 4 ))# Estimate average read length: total bases / read countTOTAL_BASES=$(awk 'NR%4==2 {total+=length($0)} END {print total}' \ results/mapping/viral_candidate_reads.fastq)if [ "$READ_COUNT" -eq 0 ]; then echo "ERROR: No non-host reads found. Check host genome and input reads." exit 1fiAVG_LENGTH=$(( TOTAL_BASES / READ_COUNT ))COVERAGE=$(( READ_COUNT * AVG_LENGTH / GENOME_SIZE ))echo " Reads: $READ_COUNT | Avg length: ${AVG_LENGTH} bp | Coverage: ~${COVERAGE}×"
if [ "$COVERAGE" -lt 30 ]; then echo " WARNING: Coverage ($COVERAGE×) is below the recommended 30× threshold." echo " Consider combining with additional samples before assembly." echo " Proceeding anyway — assembly quality may be reduced."fi
echo "=== Step 5: De novo assembly with Flye ==="flye --nano-raw results/mapping/viral_candidate_reads.fastq \ --genome-size "${GENOME_SIZE}" \ --out-dir results/assembly/ \ --threads "$THREADS"echo " Assembly complete. Results in results/assembly/"
echo "=== Step 6: Mapping QC report ==="samtools flagstat results/mapping/mapped_to_host.bam \ > results/qc/flagstat_report.txtecho " Flagstat report saved to results/qc/flagstat_report.txt"
echo "=== Pipeline complete ==="echo "All results in: results/"Part C: Algorithm Evaluation
Section titled “Part C: Algorithm Evaluation”1. Risk of incorrect genome size:
-
False pass (too large): If genome size is set to 50,000 bp instead of 5,000 bp, the coverage estimate is 10-fold too low. A sample with true 300× coverage would appear to have 30× — exactly at the threshold — and the warning would not trigger even though the data is excellent. The assembly still runs and may be fine, but you lose the diagnostic value of the coverage check.
-
False fail (too small): If genome size is set to 500 bp instead of 5,000 bp, the coverage appears 10-fold too high. A sample with genuine 10× coverage (insufficient) would appear to have 100×, and the script would proceed to assembly with inadequate data, producing a poor or failed assembly.
2. Alternative approach:
Use a prior BLAST-based estimate or biological knowledge:
- If you have done an initial quick BLASTn of a subset of reads, the top hit genome size gives you an estimate.
- Alternatively, use Minimap2’s read-to-read overlap output to estimate genome size from the read depth curve (this is what Flye does internally with its
--asm-coverageparameter). - If the expected organism is known (e.g., “probably a begomovirus”), the NCBI taxonomy entry will give you the expected genome size.
3. set -euo pipefail:
Without this, if minimap2 failed (e.g., due to a corrupted genome file), bash would continue to the next command, attempting to extract unmapped reads from a non-existent BAM file, then continuing to assembly. By the end, you would have an assembly of nothing — and no error message indicating anything had gone wrong. The script would appear to succeed while producing garbage output.
set -euo pipefail implements decomposition — specifically, it enforces that each sub-step must succeed before the next begins. It makes the pipeline’s modular structure enforceable, not just conceptual.
Exercise 4 Solutions: Open-Ended Challenge
Section titled “Exercise 4 Solutions: Open-Ended Challenge”Important for instructors: This exercise has no single correct answer. The solutions below represent one strong approach. Reward logical coherence, justified choices, and explicit handling of uncertainty — not agreement with this document.
Part A: Top 10 Questions for Your Supervisor
Section titled “Part A: Top 10 Questions for Your Supervisor”A strong answer will include questions in this priority order:
- Is a zebrafish reference genome and GTF annotation available? (Determines pipeline type: alignment-based vs. de novo assembly)
- What organism/strain of zebrafish? (Determines which reference to use: GRCz11 for Danio rerio)
- Were the samples run on the same flow cell / the same sequencing batch? (Determines whether batch correction is needed)
- What is the expected fold change of the effect? (Determines power and whether n=4 is sufficient)
- What time point were the embryos treated? (Affects which developmental transcriptome is the right reference)
- Was there a treatment vehicle (DMSO)? (Controls should receive the vehicle, not just water)
- What is the library type? (Stranded or unstranded? Poly-A selection or ribo-depletion? Affects featureCounts parameters)
- Have any samples had quality issues noted at the facility? (Alerts you to samples to monitor closely)
- What is the scientific question — all DE genes, or a specific pathway? (Affects reporting threshold and focus)
- Have any prior RNA-Seq experiments been done in this system? (Allows comparison and controls for known biology)
Part B: Provisional Pipeline
Section titled “Part B: Provisional Pipeline”A strong provisional pipeline:
Explicit assumptions made under uncertainty:
- Assuming Danio rerio reference genome (GRCz11) is appropriate
- Assuming library is stranded (most modern RNA-Seq protocols)
- Assuming biological replication is genuinely independent (not technical replicates mislabelled as biological)
- Assuming “treatment” is the primary variable of interest
Provisional RNA-Seq pipeline for zebrafish antifungal data│├── 1. Quality assessment (×12 samples)│ ├── 1a. FastQC on all files│ ├── 1b. MultiQC to aggregate│ ├── 1c. Decision point A: if any sample has < 5M reads → flag for exclusion discussion│ └── 1d. Decision point B: if adapter content detected → add Trimmomatic step│├── 2. Read alignment│ ├── 2a. Download GRCz11 genome + GTF from Ensembl│ ├── 2b. Build STAR index (genomeDir, GTF file)│ ├── 2c. Align all 12 samples (STAR --outSAMtype BAM SortedByCoordinate)│ └── 2d. Evaluate: mapping rate per sample (expect ≥ 75%)│├── 3. Quantification│ ├── 3a. featureCounts (gene-level, paired-end, stranded)│ └── 3b. MultiQC to check assignment rates│├── 4. Differential expression│ ├── 4a. Import to R (DESeq2)│ ├── 4b. PCA: visualise sample clustering│ ├── 4c. Decision point C: if batch structure visible → add batch as covariate│ ├── 4d. Two contrasts: control vs. low_dose; control vs. high_dose│ └── 4e. Volcano plots, heatmaps, gene lists│└── 5. Functional interpretation ├── 5a. GO enrichment (clusterProfiler) └── 5b. Pathway analysis (KEGG or Reactome)
IF no zebrafish reference available: Replace steps 2-3 with Trinity de novo assembly + RSEM quantification (Much slower; assembly quality uncertain; treat results as exploratory only)Part C: Interpreting Unexpected Results
Section titled “Part C: Interpreting Unexpected Results”Observation 1: Zero DE genes between control and low_dose
| Possible explanation | Diagnostic |
|---|---|
| Effect of low dose is genuinely small; n=4 is underpowered | Examine estimated fold changes — are they < 1.2? Check power calculations |
| Low_dose samples are mislabelled (they are actually controls) | PCA: low_dose samples cluster with controls, not between control and high_dose |
Most relevant principle: Pattern recognition — “zero DE genes with small n” is a known power problem.
Observation 2: 4,872 DE genes between control and high_dose
| Possible explanation | Diagnostic |
|---|---|
| Genuine broad transcriptional response to high dose | Check GO enrichment — does it make biological sense for an antifungal compound? |
| Batch effect (high_dose all from one sequencing lane) | Check MultiQC — do high_dose samples share a specific lane? Check PCA for lane clustering |
Most relevant principle: Abstraction — you need to determine whether the signal is real biology or an artefact of the abstraction (treating lane = treatment).
Observation 3: Controls spread out on PCA (PC1 = 62% within controls)
| Possible explanation | Diagnostic |
|---|---|
| Biological noise (developmental stage variation within control group; zebrafish embryos are staged by hours post-fertilisation) | Check whether developmental timing was recorded — was staging consistent? |
| Technical batch effect (different RNA extraction kits, operators, or dates) | Check sample metadata — were all controls extracted on the same day? |
Most relevant principle: Decomposition — decompose the source of variance. PC1 within a single treatment group should be noise; if it is 62%, there is a structured source of variation that must be identified.
Part D: Scientific Communication Outline
Section titled “Part D: Scientific Communication Outline”The problem (one sentence): “We need to determine which genes in zebrafish embryos are transcriptionally affected by a novel antifungal compound at two doses, using RNA-Seq data from 12 samples across three treatment groups.”
Major phases:
- Data QC and preprocessing (validate data quality before committing compute time)
- Alignment to zebrafish reference genome (STAR — chosen because it handles spliced RNA-Seq reads; BWA would miss exon-exon junction reads)
- Gene-level quantification (featureCounts — gene-level aggregation appropriate for differential expression; not transcript-level, which requires different tools)
- Statistical testing (DESeq2 — designed for count data negative binomial model; not a simple t-test)
- Functional interpretation (GO enrichment — translates gene lists into biological meaning)
Two biggest risks:
- Risk 1: Batch effects masking or inflating biological signal → Mitigation: examine PCA before finalising the statistical model
- Risk 2: Insufficient power with n=4 → Mitigation: if zero DE genes found at FDR 0.05, examine at FDR 0.2 and check estimated effect sizes
What “done” looks like:
- Volcano plots for both contrasts (control vs. low, control vs. high)
- A table of top 50 DE genes per contrast with fold changes and adjusted p-values
- GO enrichment dot plots for significant gene sets
- A brief written interpretation connecting the biology to the antifungal mechanism of action