Maybe it's a child with a rare, undiagnosed condition who has been
through years of misdiagnoses and specialist referrals. Maybe it's a cancer patient whose oncologist is
holding off on a treatment decision until the genomic report lands. Maybe it's a couple in a genetic
counselor's office who submitted a sample three weeks ago and haven't heard back yet.
And
somewhere in a clinical genomics lab, a bioinformatician is staring at a VCF with 80,000 variants,
cross-referencing a ClinVar tab, pulling up a spreadsheet that a colleague built two years ago, and
waiting for VEP to finish running before they can start on ANNOVAR.
This is the gap no
one talks about enough. Primary analysis, basecalling, is largely a solved problem. Secondary analysis,
alignment, variant calling, has been industrialized. DRAGEN can take you from FASTQ to VCF in under an
hour. But tertiary analysis: annotation, filtering, prioritization, interpretation, report generation,
that's where days disappear. Sometimes weeks.
And those days are not abstract. They belong to
patients.
The Time Math Is Uncomfortable
Let's be honest about the numbers. Primary + secondary on a WGS sample: 4-8 hours of compute, depending
on coverage and infrastructure. Largely automated. You kick it off, go home, and it's done in the
morning.
Tertiary analysis in a typical clinical lab: anywhere from 2 days to several weeks,
depending on case complexity and analyst bandwidth. And the majority of that time isn't compute, it's
human time. Context-switching, manual lookups, interpretation steps that require a credentialed person
at every junction even when the variant is textbook pathogenic.
A study analyzing clinical genomics workflows found that interpretation, even in straightforward rare disease cases handled by expert clinical scientists, can take up to 11 hours per case, with complex cases pushing past 16 hours. Approximately 70% of analysis costs in these labs is staff time. Not sequencing. Not compute. People.
That's not a biology problem. That's an engineering and workflow problem. And it's fixable.
Why Does Tertiary Analysis Actually Take So Long?
Before we talk about fixes, we need to be precise about diagnosis. Because "tertiary analysis is
complex" is not a diagnosis, it's an excuse. Here's what's actually happening:
Annotation is mostly serial, not parallel.
You run VEP. You wait. Then you cross-reference ClinVar. Then you pull CADD scores. Then you check
gnomAD allele frequencies. Each of these is a separate query, often a separate tool invocation,
sometimes a separate tab in your browser.
Filtering logic lives in human memory and spreadsheets, not code.
Ask most clinical bioinformaticians how they filter variants and you'll hear something like: "We
usually start by removing common variants above 1% in gnomAD, then look at predicted impact, then check
disease gene panels..."
There's no tiering layer. Every variant gets the same level of attention.
This is the big one. A WES run can produce 20,000-80,000 variants. The vast majority of these are
benign, common, or completely irrelevant to the clinical question. But in many labs, there's no
systematic pre-classification layer that separates the clearly pathogenic (PVS1 + PS1 in a
disease-relevant gene, this is not ambiguous), the probably benign, and the genuinely uncertain.
Everything lands in the same pile. Senior analysts spend as much cognitive energy dismissing obvious
noise as they do on the variants that actually require judgment.
How People Are Actually Solving This
Labs and tools have been chipping away at this problem, and the patterns are worth understanding,
because the pattern is more valuable than the specific tool.
Parallelizing annotation pipelines.
The architectural shift here is simple: stop treating annotation as a sequential pipeline and start
treating it as a set of parallel jobs with a final merge step. Run VEP and ANNOVAR concurrently. Fire
your ClinVar lookups and gnomAD frequency queries simultaneously. Use Nextflow or Snakemake to
coordinate these as parallel processes rather than sequential bash scripts.
A rough
Nextflow pattern looks like this:
workflow ANNOTATE {
take: vcf_ch
main:
VEP(vcf_ch)
ANNOVAR(vcf_ch)
CLINVAR_LOOKUP(vcf_ch)
MERGE_ANNOTATIONS(
VEP.out,
ANNOVAR.out,
CLINVAR_LOOKUP.out
)
}
Three annotation sources. Running simultaneously. Merged at the end. This isn't clever, it's just not
doing things serially when you don't have to.
Encoding filtering logic as versioned, auditable code.
The move here is to take whatever logic lives in your head or your Excel file and formalize it into a
config-driven filtering script. The logic doesn't change, your threshold for gnomAD MAF is still 0.01,
your impact filter is still HIGH/MODERATE. But now it's in a YAML config file that's version-controlled,
reproducible, and shareable with a new analyst on their first day.That's not a biology problem. That's
an engineering and workflow problem. And it's fixable.# variant_filter_config.yamlThis is not sophisticated code. But it's the difference between filtering logic that lives in one
person's head and filtering logic that is transparent, consistent, and can be improved
systematically.
population_frequency:
gnomad_af_max:
0.01
gnomad_af_popmax: 0.005
consequence_include:
-
stop_gained
- frameshift_variant
-
splice_acceptor_variant
- splice_donor_variant
-
missense_variant
clinvar_exclude_significance:
- Benign
-
Likely_benign
Building a tiering layer before human review.
The goal here is to protect senior analyst time for cases that genuinely require judgment, and automate
the routing of everything else. A practical tiering approach based on ACMG/AMP criteria looks something
like:
Tier 1 (Automate with high confidence): Variants with ClinVar pathogenic/likely
pathogenic classification, 2+ star review status, in a gene directly relevant to the patient's
phenotype.
Tier 2 (Expedited review): Variants with
predicted high impact (PVS1-triggering) in known disease genes, novel variants in well-characterized
genes.
Tier 3 (Full review): Genuinely ambiguous,
conflicting evidence, VUS with uncertain functional impact, novel genes. This is where expert analyst
time actually adds value.
The point isn't to remove humans from the process. It's to make sure humans are doing human work,
exercising judgment on genuinely uncertain cases, rather than manually dismissing variants that an
algorithm could confidently classify.
Tools like Emedgene (Illumina) have been moving in this
direction, automated ACMG classification, phenotype-driven prioritization, structured report
generation.
What We Can Automate
Here's what's actually actionable in your current stack:
- Start with the tiering script.
-
Move your filtering config out of your head. Write it down. Put it in a YAML or JSON file.
-
Parallelize your annotation with a workflow manager like Snakemake or Nextflow.
What Automation Can't Do (And Shouldn't Try To)
To be straight with you: there are cases where no amount of tooling gets you to a fast answer. A VUS in
a gene with conflicting functional evidence, in a patient with an atypical phenotype, involving a
variant type with limited population data, that case needs a human expert, period.
The
diagnostic odyssey for rare disease patients, the years of misdiagnoses, the families bouncing between
specialists, happens partly because genomic capacity is constrained.
That's the real
cost of the current architecture. And it's a cost worth fixing.
Primary and secondary
analysis are largely compute problems. They were hard, smart people worked on them, and now they're
mostly solved. Tertiary analysis is still being treated as a biology problem that requires human experts
at every step. But most of what makes it slow isn't biology, it's architecture. Serial annotation. Logic
in spreadsheets. No tiering layer. Those are engineering problems. And the field has everything it
needs to solve them.
The patient waiting on that report doesn't know the difference between a
delay caused by variant complexity and a delay caused by a pipeline that runs annotation jobs serially.
But we do. And that's on us to fix.
The NonStop Promise
At NonStop, we don't just build software - we build systems that scale, adapt, and endure. Every platform we deliver is engineered to handle real-world complexity, regulatory rigor, and long-term growth. From architecture to execution, our promise is simple: clarity in decisions, confidence in delivery, and technology that keeps your business moving forward.