If you’ve ever tried running Nextflow workflows in the cloud, you know the struggle is real. Local machines? Too slow. Manual scaling? A nightmare. That’s why I decided to dive headfirst into running Nextflow on Amazon EKS (Elastic Kubernetes Service).
In this guide, I’ll walk you through exactly how I set up a Nextflow environment on EKS, complete with spot instances for cost savings and secure S3 access using Pod Identity.
What is Nextflow?
Before we dive into the “why EKS” question, let’s talk about what Nextflow actually is.
Nextflow is a workflow orchestration engine designed for data-intensive computational pipelines. Think of it as a sophisticated task manager that can:
- Chain together multiple steps: Take raw data, run it through analysis tools, and generate reports in a defined sequence
- Handle dependencies automatically: Task B won’t run until Task A finishes successfully
- Parallelize work: If you have 100 samples to analyze, Nextflow can process them all at once (resource permitting)
- Resume from failures: If your pipeline crashes at step 5 out of 10, you can restart from step 5 instead of starting over
It was originally built for bioinformatics (genomics workflows can have dozens of steps and take days to run), but it’s now used in machine learning, data science, and anywhere you have complex, multi-step computational workflows.
The key feature? Nextflow uses containers (like Docker) for each task, which means your analysis is reproducible’ll run the same way on my laptop as it does on your cloud cluster.
Why EKS for Nextflow?
Before we jump in, let’s talk about why this setup makes sense.
Nextflow is fantastic for orchestrating complex computational workflows, but it needs resources and lots of them. Running everything locally or on a single EC2 instance quickly becomes a bottleneck. Kubernetes, on the other hand, gives us:
- Auto-scaling: Spin up pods as needed, shut them down when done
- Cost optimization: Use spot instances to cut costs by up to 90%
- Flexibility: Run multiple workflows concurrently without stepping on each other’s toes
- Reliability: Built-in retry logic and fault tolerance
EKS takes all of this and makes it managed, which means less time wrestling with control planes and more time getting actual work done.
The Game Plan
Here’s what we’re building:
- An EKS cluster optimized for batch workloads
- Secure S3 access using Pod Identity
- Kubernetes RBAC so our Nextflow head pod can manage worker pods
- GitHub integration using SSH keys for pipeline code management
Let’s break it down phase by phase.
Prerequisites: Getting Your Tools Ready
Before we dive into building the cluster, we need to make sure our local environment is set up properly. You’ll need two essential tools: the AWS CLI and kubectl.
Configure AWS CLI
If you haven’t already, install and configure the AWS CLI. This is how you’ll interact with AWS services from your terminal.
Install the AWS CLI (if needed):
# For Linux
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install# For macOS
brew install awscliNow configure it with your credentials:
aws configure
You’ll be prompted for:
- WS Access Key ID: Your IAM user access key
- AWS Secret Access Key: Your secret key
- Default region name: I’m using
us-east-1for this guide - Default output format:
jsonworks great
Pro tip: If you’re working with multiple AWS accounts or profiles, use aws configure --profile <profile-name> to keep things organized.
Install kubectl
kubectl is your command-line tool for interacting with Kubernetes clusters. You absolutely need this.
For Linux:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectlFor macOS:
brew install kubectl
Verify the installation:
kubectl version --client
Install eksctl
While we’re at it, let’s install eksctl too. We'll need it for creating the cluster:
# For Linux
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin# For macOS
brew install eksctlAlright, tools ready!
Phase 1: Building Our Infrastructure
Setting Up the EKS Cluster
First, we need a cluster. I’m using eksctl because it handles a lot of the heavy lifting for us.
eksctl create cluster \
--name nextflow-eks-cluster \
--region us-east-1 \
--nodegroup-name spot-nodes \
--instance-types t3.medium \
--managed \
--spot \
--nodes 1 \
--with-oidc \
--vpc-nat-mode DisableA quick note on that last flag: I disabled NAT Gateway here to keep costs down during testing. In production, you’ll want to enable it for better security and connectivity.
The --spot flag is doing the real magic here. It tells EKS to use spot instances, which are way cheaper than on-demand instances. Perfect for batch workloads where a little interruption risk is acceptable.
Installing Pod Identity Agent
Now here’s where things get interesting. Instead of creating IAM users and storing access keys, we’re using EKS Pod Identity. This lets our pods assume IAM roles directly.
eksctl create addon \
--cluster nextflow-eks-cluster \
--name eks-pod-identity-agent \
--region us-east-1This add-on runs as a DaemonSet on your cluster and handles all the credential magic behind the scenes.
S3 Bucket Setup
Head over to the S3 console and create a bucket. Inside it, create a folder called inputs . This is where we'll store our input data.
For this example, I’m using a FASTQ file (bioinformatics folks will know what this is). You can grab a sample dataset from public genomics repositories if you want to follow along.
IAM Permissions
We need to create a role that allows our Nextflow pods to:
- Access S3 buckets
- Create and manage other pods in the cluster
eksctl create podidentityassociation \
--cluster nextflow-eks-cluster \
--namespace nextflow \
--serviceaccount nextflow-sa \
--role-name nextflow-pod-role \
--permission-policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
--region us-east-1You can also do this through the AWS console
Phase 2: Kubernetes Configuration
Now that our infrastructure is ready, let’s configure Kubernetes to handle our Nextflow workloads.
Creating the Namespace and Service Account
Namespaces keep things organized. Let’s create one specifically for Nextflow:
kubectl create namespace nextflow
kubectl create serviceaccount nextflow-sa --namespace nextflow
RBAC: Giving Permissions
RBAC (Role-Based Access Control) is how we tell Kubernetes what our Nextflow head pod is allowed to do. We need it to create worker pods, check their status, and clean them up when done.
Save this as rbac.yaml:
# Define what the role can do
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nextflow-role
namespace: nextflow
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/status"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "delete"]
---
# Bind the role to our service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nextflow-rolebinding
namespace: nextflow
subjects:
- kind: ServiceAccount
name: nextflow-sa
namespace: nextflow
roleRef:
kind: Role
name: nextflow-role
apiGroup: rbac.authorization.k8s.ioApply it:
kubectl apply -f rbac.yaml
SSH Keys for GitHub Access
Since we’re pulling our Nextflow pipeline from GitHub, we need authentication. SSH keys are the way to go.
Spoiler alert: If your GitHub repo is public, you can skip this entire SSH key setup and use HTTPS URLs to clone. But for private repos, SSH keys are essential.
On your local machine, generate a new key pair:
ssh-keygen -t ed25519 -C "nextflow-eks-pipeline"When prompted, save it to a specific location, like ~/.ssh/nextflow_eks_key (or just hit Enter to use the default location). Don't set a passphrase; Kubernetes won't be able to use a password-protected key.
Now store the private key as a Kubernetes secret:
kubectl create secret generic git-ssh-key \
--from-file=ssh-privatekey=/path/to/your/private/key \
-n nextflow
Phase 3: GitHub Configuration
Take the public key you just generated and add it to your GitHub repository:
- Go to your repo’s Settings → Deploy keys
- Click Add deploy key
- Paste your public key
- Give it a descriptive name like “EKS Nextflow Runner.”
- Check “Allow write access” if your workflow needs to push results back (optional)
Phase 4: The Implementation
This is where it all comes together. We need two things: our Nextflow pipeline code and a Kubernetes Job to run it.
The Nextflow Pipeline
For this example, we’re running a bioinformatics quality control workflow on genomic sequencing data. Specifically:
What we’re doing:
- FASTQC: Analyzes our raw sequencing data (FASTQ file) and generates quality control reports. It checks things like read quality scores, sequence duplication levels, GC content, and potential contamination. Each dataset gets its own HTML report with pretty graphs.
- MultiQC: Takes all those individual FASTQC reports and aggregates them into a single, beautiful summary report. Instead of opening 10 different HTML files, you get a comprehensive view of your data quality in one place.
Think of it this way: FASTQC is your detailed inspector, checking each piece individually, and MultiQC is the manager who summarises everything into a single executive report.
main.nf- This is your pipeline definition:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
params.reads = "s3://<your-bucket-name>/inputs/dataset.fastq"
process FASTQC {
container 'biocontainers/fastqc:v0.11.9_cv8'
publishDir "${params.outdir}/fastqc", mode: 'copy'
input:
path reads
output:
tuple path("*.html"), path("*_fastqc.zip")
script:
"""
echo "Running FASTQC on ${reads}"
fastqc ${reads}
echo "FASTQC completed for ${reads}"
"""
}
process MULTIQC {
container 'multiqc/multiqc:dev'
publishDir "${params.outdir}/multiqc", mode: 'copy'
input:
path fastqc_reports
output:
path "multiqc_report.html"
path "multiqc_data"
script:
"""
echo "Aggregating FASTQC reports with MultiQC"
multiqc .
echo "MultiQC report generated successfully"
"""
}
workflow {
reads_ch = Channel.fromPath(params.reads)
fastqc_reports_ch = FASTQC(reads_ch)
MULTIQC(fastqc_reports_ch.collect())
}Important notes:
- Update
<your-bucket-name>.gitto match your S3 Bucket
nextflow.config- This tells Nextflow how to run on Kubernetes:
plugins {
id 'nf-wave'
id 'nf-amazon'
}
wave.enabled = true
fusion.enabled = true
fusion.exportStorageCredentials = true
process {
executor = 'k8s'
cpus = 1
memory = '2 GB'
maxForks = 1
withName: 'FASTQC' {
container = 'biocontainers/fastqc:v0.11.9_cv8'
}
withName: 'MULTIQC' {
container = 'multiqc/multiqc:dev'
}
}
k8s {
namespace = 'nextflow'
serviceAccount = 'nextflow-sa'
}Push this to your GitHub repo.
The Kubernetes Job
Now for the piece that ties everything together. But first, let’s talk about why we’re using a Kubernetes Job instead of a Deployment or a plain Pod.
Why a Job?
Kubernetes gives us several ways to run workloads, and choosing the right one matters:
- Pod: Runs once, but if it fails, it’s done. No retry logic, no cleanup guarantees. Not ideal for batch workloads.
- Deployment: Designed for long-running services that should stay up. If a pod dies, it gets restarted automatically. Great for web servers, terrible for batch jobs that need to run once and finish.
- Job: Perfect for batch workloads! It runs to completion, has built-in retry logic (controlled by
backoffLimit), and can automatically clean itself up when done (viattlSecondsAfterFinished).
For Nextflow pipelines, we want:
- Run once and finish
- Automatic cleanup after completion
- Clear success/failure status
- No accidental restarts
That’s exactly what a Job gives us.
Save this as nextflow-job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
namespace: nextflow
generateName: "nextflow-job-"
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 300
template:
spec:
serviceAccountName: nextflow-sa
restartPolicy: Never
initContainers:
- name: fetch-pipeline
image: alpine/git:latest
env:
- name: BRANCH
value: "main" # Change to your branch name
command:
- sh
- -c
- |
set -e
mkdir -p /root/.ssh
cp /var/ssh/ssh-privatekey /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
ssh-keyscan github.com >> /root/.ssh/known_hosts
git clone -b "${BRANCH}" git@github.com:<your-username>/<your-repo>.git /workspace
volumeMounts:
- name: workspace
mountPath: /workspace
- name: git-ssh-key
mountPath: /var/ssh
readOnly: true
containers:
- name: nextflow
image: nextflow/nextflow:25.12.0-edge
env:
- name: NXF_PLUGINS_DEFAULT
value: "nf-amazon,nf-wave,nf-k8s"
- name: JOB_NAME
value: "fastqc-analysis"
command:
- sh
- -c
- |
set -e
cd /workspace
WORK_DIR="s3://<your-bucket-name>/work/${JOB_NAME}"
RESULTS_DIR="s3://<your-bucket-name>/results/${JOB_NAME}"
echo "Work directory: $WORK_DIR"
echo "Results directory: $RESULTS_DIR"
exec nextflow run main.nf \
-c nextflow.config \
--outdir $RESULTS_DIR \
-work-dir "$WORK_DIR" \
-resume
volumeMounts:
- name: workspace
mountPath: /workspace
volumes:
- name: workspace
emptyDir: {}
- name: git-ssh-key
secret:
secretName: git-ssh-key
defaultMode: 0400Important notes:
- Update
<your-username>/<your-repo>.gitto match your GitHub repo - Update
<your-bucket-name>to match your S3 bucket - The
generateNamefield means each run creates a unique job
Running the Pipeline
Here’s the fun part. Launch your pipeline:
kubectl create -f nextflow-job.yamlNote: We use kubectl create instead of kubectl apply because generateName creates a new job each time.
Monitoring Your Workflow
Watch the job spin up :
kubectl get jobs -n nextflowCheck the logs in real-time:
kubectl logs job/<job-name> -n nextflow -fYou’ll see Nextflow do its thing, spawning worker pods, running tasks, and saving results to S3.
Checking Your Results
Once everything completes, head to S3. You should see your FASTQC HTML reports and MultiQC summary. Beautiful!
Cleanup
The ttlSecondsAfterFinished: 300 Setting in our Job spec means Kubernetes automatically deletes completed jobs after 5 minutes. This keeps your cluster clean.
Conclusion
Running Nextflow on EKS gives you the best of both worlds: the power and flexibility of Nextflow with the scalability and reliability of Kubernetes. Sure, there’s a learning curve, but once you have this foundation in place, you can run virtually any computational workflow efficiently and cost-effectively.
If you found this helpful, consider giving it a clap or sharing it with your DevOps and bioinformatics friends. And if you run into issues or have questions, drop a comment below. I’m always happy to help troubleshoot.
Happy Hosting!

