ReViA: Regulatory Variants Annotation Pipeline
Abstract
Motivation. Most disease-associated variants identified in genome-wide association studies are located in non-coding regions, where their functional interpretation remains challenging. Unlike coding mutations, non-coding single nucleotide variants (SNVs) often act through gene regulation, making their effects less direct. Existing tools, including pre-trained machine learning models and annotation databases, provide useful resources but are limited when analyzing new datasets or uncovering cohort-specific biology. A method that integrates user-supplied multi-omics data while producing interpretable results may help address this gap.
Results. We present ReViA, a pipeline for annotating non-coding SNVs by integrating gene expression, chromatin activity (e.g., ChIP-seq, ATAC-seq), and variant data from the same cohort of samples. This design allows the identification of variants that may be specific to the studied group and relevant for gene regulation. Unlike pre-trained approaches, ReViA adapts to the input dataset, and unlike static databases, it can analyze any set of SNVs, including those not previously reported. ReViA outputs statistical evidence and graphical summaries for all analyses, enabling users to critically assess findings. We illustrate its use on datasets from glioma and Alzheimer’s disease, where the pipeline highlighted candidate regulatory variants with potential functional impact supported by external evidence. These case studies demonstrate how ReViA can generate biologically informed hypotheses about regulatory variation in disease contexts. In summary, ReViA provides a flexible and interpretable framework for the annotation of non-coding SNVs, complementing existing resources by focusing on cohort-specific, evidence-based analysis.