Module 7 — Bash Scripting for Bioinformatics
Module Overview
Section titled “Module Overview”Every time you have sat at the terminal and typed the same FastQC command for each sample, one after another, you have been doing work a script could do for you. A Bash script is nothing more than a list of commands saved in a file — but that simple idea unlocks something powerful: the ability to run the same analysis on ten, a hundred, or a thousand samples with a single command.
This module introduces Bash scripting from the ground up. No prior scripting experience is assumed. By the end, you will have written a real script that processes sequencing files from the Training folder — the same kind of script you will use every day as a bioinformatician.
Why This Module Exists
Section titled “Why This Module Exists”In Modules 1–6 you learned to navigate Linux, manage environments, run quality control, assemble genomes, and think computationally about problems. Every tool you have used — FastQC, Minimap2, Flye — was controlled from the command line, one command at a time.
The next step is automation: writing down your workflow so you do not have to type it again. Bash scripting is how bioinformaticians do this. It is not programming in the traditional sense — you are simply saving the commands you already know into a file and letting Bash run them for you.
Learning Objectives
Section titled “Learning Objectives”By the end of this module, you will be able to:
- Explain what a Bash script is and why the shebang line is essential
- Create, make executable, and run a Bash script
- Use variables to store file paths and sample names
- Use
basenameto extract clean filenames and sample names from full paths - Write a simple for loop to repeat a command across multiple files
- Use
>and>>to save output to a log file - Use
teeto print output to the terminal and save it simultaneously - Use
exec > >(tee logfile) 2>&1to capture all output — including errors — with a single line at the top of a script - Build a complete script that automates a real bioinformatics task
Lessons
Section titled “Lessons”| Lesson | Title | Focus |
|---|---|---|
| 1 | What Is a Bash Script and Why Use One? | The problem of repetition; what a script is; what you will build |
| 2 | Your First Script — The Shebang Line | Creating a script; the shebang; chmod +x; running with ./ |
| 3 | Variables and basename | Storing paths and names; extracting clean filenames |
| 4 | Simple For Loops | Repeating commands over a list of files |
| 5 | Saving Output to a Log File | > and >> redirection; tee; exec > >(tee log) 2>&1 for full capture |
| 6 | Putting It All Together | A complete script using all concepts on real Training data |
Supporting Materials
Section titled “Supporting Materials”| File | Purpose |
|---|---|
Exercises.md | Hands-on exercises with graduated difficulty |
Solutions.md | Complete worked solutions |
Prerequisites
Section titled “Prerequisites”Before starting this module, you should be comfortable with:
- Basic Linux navigation (
cd,ls,pwd) from Module 1 - Running tools from the command line (e.g.,
fastqc,conda activate) - The concept of file paths (absolute and relative)
No scripting experience is required.
Training Data
Section titled “Training Data”The scripts in this module use files from the Training/ folder:
Training/short_reads/paired/SRR1553607_1.fastqTraining/short_reads/paired/SRR1553607_2.fastqThese are the same Illumina paired-end reads used in earlier modules.
A Note on Simplicity
Section titled “A Note on Simplicity”This module deliberately teaches a small, useful subset of Bash. There is much more to Bash scripting — conditional statements, functions, error handling — but these are not needed to write effective bioinformatics scripts at this stage, and adding them too early makes scripts harder to read and debug.
Learn these foundations well. Everything else builds on them.