ChoCallate: a computational pipeline for consensus variant calling in genomic sequencing data
Abstract
Accurate variant calling is a cornerstone of genomic analysis, yet the limitations and biases of individual algorithms often compromise the accuracy of single-nucleotide variant (SNV) and insertion/deletion (InDel) detection. To overcome this, we developed ChoCallate, a robust computational pipeline that implements a voting-based consensus strategy to integrate calls from multiple variant detection tools. Built in Nextflow for portability and scalability, ChoCallate enables flexible consensus rules - including majority, N-1, or full agreement - to generate high-confidence variant sets. It is designed to handle samples of varying ploidy, making it particularly valuable for plant and non-model organism genomics. The pipeline accepts raw sequencing reads or aligned BAM files, performs parallelized processing, and outputs quality-filtered, annotated variants in BCF format. By providing a standardized, production-ready framework for ensemble variant calling, ChoCallate eliminates the need for custom scripting, mitigates tool-specific biases, and ensures reproducible, high-confidence variant discovery for diverse genomic studies.