Skip to content

Module 7 — Bash Scripting for Bioinformatics

Every time you have sat at the terminal and typed the same FastQC command for each sample, one after another, you have been doing work a script could do for you. A Bash script is nothing more than a list of commands saved in a file — but that simple idea unlocks something powerful: the ability to run the same analysis on ten, a hundred, or a thousand samples with a single command.

This module introduces Bash scripting from the ground up. No prior scripting experience is assumed. By the end, you will have written a real script that processes sequencing files from the Training folder — the same kind of script you will use every day as a bioinformatician.


In Modules 1–6 you learned to navigate Linux, manage environments, run quality control, assemble genomes, and think computationally about problems. Every tool you have used — FastQC, Minimap2, Flye — was controlled from the command line, one command at a time.

The next step is automation: writing down your workflow so you do not have to type it again. Bash scripting is how bioinformaticians do this. It is not programming in the traditional sense — you are simply saving the commands you already know into a file and letting Bash run them for you.


By the end of this module, you will be able to:

  • Explain what a Bash script is and why the shebang line is essential
  • Create, make executable, and run a Bash script
  • Use variables to store file paths and sample names
  • Use basename to extract clean filenames and sample names from full paths
  • Write a simple for loop to repeat a command across multiple files
  • Use > and >> to save output to a log file
  • Use tee to print output to the terminal and save it simultaneously
  • Use exec > >(tee logfile) 2>&1 to capture all output — including errors — with a single line at the top of a script
  • Build a complete script that automates a real bioinformatics task

LessonTitleFocus
1What Is a Bash Script and Why Use One?The problem of repetition; what a script is; what you will build
2Your First Script — The Shebang LineCreating a script; the shebang; chmod +x; running with ./
3Variables and basenameStoring paths and names; extracting clean filenames
4Simple For LoopsRepeating commands over a list of files
5Saving Output to a Log File> and >> redirection; tee; exec > >(tee log) 2>&1 for full capture
6Putting It All TogetherA complete script using all concepts on real Training data

FilePurpose
Exercises.mdHands-on exercises with graduated difficulty
Solutions.mdComplete worked solutions

Before starting this module, you should be comfortable with:

  • Basic Linux navigation (cd, ls, pwd) from Module 1
  • Running tools from the command line (e.g., fastqc, conda activate)
  • The concept of file paths (absolute and relative)

No scripting experience is required.


The scripts in this module use files from the Training/ folder:

Training/short_reads/paired/SRR1553607_1.fastq
Training/short_reads/paired/SRR1553607_2.fastq

These are the same Illumina paired-end reads used in earlier modules.


This module deliberately teaches a small, useful subset of Bash. There is much more to Bash scripting — conditional statements, functions, error handling — but these are not needed to write effective bioinformatics scripts at this stage, and adding them too early makes scripts harder to read and debug.

Learn these foundations well. Everything else builds on them.