Lesson 6 — Putting It All Together
Learning Objectives
Section titled “Learning Objectives”By the end of this lesson, you will be able to:
- Build a complete Bash script from scratch using all concepts from this module
- Describe the purpose of every line in the finished script
- Run the script on real Training data and verify the output
1. What We Are Building
Section titled “1. What We Are Building”In this lesson, you will write a script that brings together every concept from Lessons 1–5:
- Defines the location of the input data, the output folder, and the log file as variables
- Sets up automatic logging with
execso all output is captured without any extra effort - Creates the output folder
- Loops over every FASTQ file in the input folder
- Extracts a clean sample name using
basename - Runs FastQC on each file
- Prints a final message when all files are done
This is a real, useful bioinformatics script. The structure — define your paths, set up logging, loop over files, process each one — is a pattern you will reuse across many projects.
2. Planning Before Writing
Section titled “2. Planning Before Writing”Before opening a text editor, take a moment to think through the structure. This is the computational thinking from Module 6 in practice.
What do I need?
- The path to the Training data folder
- A folder for FastQC output
- A log file to record everything that ran
- A loop to handle each file
What will the script do, step by step?
- Set variables for the input directory, output directory, and log file
- Use
exec > >(tee $LOG_FILE) 2>&1to capture all output automatically from here onwards - Create the output directory
- Print an opening message
- Loop over
.fastqfiles — for each: extract the sample name, run FastQC, print a progress message - Print a final message when the loop ends
3. Building the Script Step by Step
Section titled “3. Building the Script Step by Step”Create a new file:
nano run_fastqc.shStep 1 — The shebang line
Section titled “Step 1 — The shebang line”#!/bin/bashStep 2 — Variables for input, output, and the log
Section titled “Step 2 — Variables for input, output, and the log”DATA_DIR="Training/short_reads/paired"OUTPUT_DIR="fastqc_results"LOG_FILE="run_log.txt"Store all three paths as variables at the top. If anything needs to change — input folder, output folder, log name — you update it here and nowhere else.
Step 3 — Set up logging with exec
Section titled “Step 3 — Set up logging with exec”exec > >(tee $LOG_FILE) 2>&1This single line redirects all subsequent output — every echo, every tool message, and every error — to both the terminal and $LOG_FILE simultaneously. You do not need >> or | tee on any individual line after this.
Step 4 — Create the output directory and print the opening message
Section titled “Step 4 — Create the output directory and print the opening message”mkdir -p $OUTPUT_DIRecho "FastQC run started"mkdir -p creates the output directory safely — no error if it already exists. The echo prints to the terminal and is automatically captured to the log by the exec line above.
Step 5 — The loop
Section titled “Step 5 — The loop”for FILE in $DATA_DIR/*.fastqdo SAMPLE=$(basename $FILE .fastq) echo "Processing: $SAMPLE" fastqc $FILE --outdir $OUTPUT_DIRdoneOn each pass through the loop:
$FILEholds the full path to one FASTQ file$SAMPLEholds the clean sample name extracted bybasenameechoprints a progress message that is automatically captured in the logfastqcruns on that file, saving results to$OUTPUT_DIR
Step 6 — Final message
Section titled “Step 6 — Final message”echo "All samples processed. Results in: $OUTPUT_DIR"Prints to the terminal and goes to the log — no extra redirection needed.
4. The Complete Script
Section titled “4. The Complete Script”#!/bin/bash
# --- Configuration ---DATA_DIR="Training/short_reads/paired"OUTPUT_DIR="fastqc_results"LOG_FILE="run_log.txt"
# --- Logging: capture everything from here onwards ---exec > >(tee $LOG_FILE) 2>&1
# --- Setup ---mkdir -p $OUTPUT_DIRecho "FastQC run started"
# --- Process each sample ---for FILE in $DATA_DIR/*.fastqdo SAMPLE=$(basename $FILE .fastq) echo "Processing: $SAMPLE" fastqc $FILE --outdir $OUTPUT_DIRdone
# --- Done ---echo "All samples processed. Results in: $OUTPUT_DIR"5. Running the Script
Section titled “5. Running the Script”Save the file, make it executable, and run it:
chmod +x run_fastqc.sh./run_fastqc.shExpected terminal output (FastQC also prints its own messages between these lines):
FastQC run startedProcessing: SRR1553607_1Processing: SRR1553607_2All samples processed. Results in: fastqc_resultsAfter it finishes, check the FastQC output files:
ls fastqc_results/You should see HTML and zip files for each sample:
SRR1553607_1_fastqc.htmlSRR1553607_1_fastqc.zipSRR1553607_2_fastqc.htmlSRR1553607_2_fastqc.zipCheck the log file:
cat run_log.txtBecause exec captures everything, the log will contain your echo messages and the FastQC tool output together — a complete record of the run.
6. What You Have Built
Section titled “6. What You Have Built”Every concept from this module is present in this script:
| Concept | Where in the script |
|---|---|
| Shebang line | Line 1: #!/bin/bash |
| Variables | DATA_DIR, OUTPUT_DIR, LOG_FILE |
basename | Extracts clean sample name from the full file path |
Command substitution $() | SAMPLE=$(basename $FILE .fastq) |
| For loop | Loops over every .fastq file in DATA_DIR |
exec > >(tee $LOG_FILE) 2>&1 | Captures all output — stdout and stderr — to terminal and log |
7. Adapting the Script
Section titled “7. Adapting the Script”The power of this structure is how easily it adapts:
- Different input folder: change the value of
DATA_DIR - Different output folder: change the value of
OUTPUT_DIR - Different log file: change the value of
LOG_FILE - Different tool: replace
fastqc $FILE --outdir $OUTPUT_DIRwith any command - More files: drop additional
.fastqfiles intoDATA_DIR— the script picks them up automatically next time you run it
The script does not need to know in advance how many files exist. It finds them, processes them, and records them — however many there are.
Summary
Section titled “Summary”- A complete script combines shebang, variables,
basename, a loop, and logging exec > >(tee $LOG_FILE) 2>&1at the top captures all output automatically — no>>or| teeneeded on individual linesmkdir -pcreates a directory safely — no error if it already exists- Storing all paths as variables at the top makes the script easy to read and adapt
- The loop structure scales from 2 files to 2000 with no changes to the script
You have now written a real bioinformatics script. Work through the exercises to build confidence with these concepts before moving on.