The proviral pipeline is designed to analyze proviral sequences through several stages — from primer detection and alignment to detailed error reporting and result summarization. In this document we assume you have already installed Docker and have prepared your input data files. In this chapter you will learn how to run the pipeline using the provided Docker image as well as the sample entrypoint for a complete end–to–end example.
Assumptions and Prerequisites
Before you start, please ensure that:
- Docker is installed and running on your system.
- Your current working directory contains the required input files:
- sample_info.csv (a CSV containing sample metadata such as run_name and sample identifiers)
- contigs.csv (assembled contig sequences)
- conseqs.csv (consensus sequences)
- cascade.csv (MiCall output indicating mapped read counts)
For more details on data preparation and installation, see the Installation and Data Preparation sections.
Launching the Pipeline Using Docker
The pipeline image available on Docker Hub is preconfigured to run all necessary steps. In this example, we demonstrate how to run the pipeline through the sample
entrypoint. This entrypoint coordinates the analysis by calling several modules (primer detection, gene splicing, landscape generation, etc.) and ultimately writes a set of output files for review.
To run the pipeline using the sample
entrypoint, open your terminal, navigate to your working directory (which contains your input files), and use the following command:
docker run --rm -v .:/w cfelab/proviral sample sample_info.csv contigs.csv conseqs.csv cascade.csv outcome_summary.csv conseqs_primers.csv contigs_primers.csv table_precursor.csv proviral_landscape.csv detailed_results.tar --cfeintact
Command Breakdown
-
docker run --rm
Runs the container and automatically removes it after execution. -
-v .:/w
Mounts your current directory (which holds both input and eventual output files) into the container’s/w
folder. -
cfelab/proviral
Specifies the Docker image containing the proviral pipeline. -
sample
Instructs the container to use thesample
entrypoint (the sample.py module). This entrypoint is ideal for processing a single-sample run. - Positional Parameters:
sample_info.csv contigs.csv conseqs.csv cascade.csv outcome_summary.csv conseqs_primers.csv contigs_primers.csv table_precursor.csv proviral_landscape.csv detailed_results.tar
These parameters denote, in order:- sample_info.csv – An input metadata file
- contigs.csv and conseqs.csv – Input sequence files
- cascade.csv – The cascade CSV from MiCall
- Output Files:
- outcome_summary.csv – Overall summary of the analysis
- conseqs_primers.csv – Primer analysis for consensus sequences
- contigs_primers.csv – Primer analysis for contigs
- table_precursor.csv – Data ready for downstream upload
- proviral_landscape.csv – Data for generating proviral landscape plots
- detailed_results.tar – Archive of detailed results produced by CFEIntact (or HIVSeqinR)
--cfeintact
Specifies that the CFEIntact backend should be used for downstream analysis. You may alternatively specify--hivseqinr
if that is preferred.
Troubleshooting
In case you encounter errors:
- Review the terminal output and any generated log files to identify issues (common errors include missing or misformatted input files).
- Verify that Docker is properly mounting your current directory (using the
-v
flag). - Revisit Installation and Data Preparation if input file formats or configurations are uncertain.
- Run the pipeline with the
--help
option inside Docker (e.g.,docker run --rm cfelab/proviral --help
) to review available options and usage information.