MiCall at BC CfE

MiCall at BC CfE

The MiCall tool is designed to process FASTQ data generated by the Illumina MiSeq sequencing platform. MiCall is primarily utilized for HIV resistance testing, alongside research into various types of sequence analysis, such as the examination of proviral sequences for defects and other characteristics. As open-source software, MiCall offers transparency and adaptability, meeting diverse research needs by supporting the sequencing of various organisms and accommodating the unique requirements of different studies. The tool is specialized in deep sequencing of viral samples, differentiating it from the lab’s other sequencing tool, ReCall, which is employed for population-level genomic sequencing. Our laboratory uses MiCall for sequencing HIV, HCV, and SARS-CoV-2.

While the results obtained from MiCall are mainly intended for research purposes and are not used in clinical settings, there is an exception for specific V3 loop analysis requests. This particular analysis aids in drug prescription decisions based on mutations in the V3 region of HIV. However, for general clinical results, the lab relies on ReCall. MiCall remains an invaluable research tool, offering comprehensive deep sequencing capabilities and robust analysis of proviral sequences.

MiCall operates in two modes:

  • Remapping Mode In this mode, MiCall maps all reads from a sample against a set of reference sequences, updates the sequences, remaps the reads, then combines the reads into consensus sequences and coverage maps. This mode is typically used for clinical decisions.

  • De-Novo Assembly Mode This mode assembles sequences from scratch without relying on a reference genome, then uses them to provide the same kinds of consensus sequences and coverage maps. Currently, it is used exclusively for research purposes.

The operational behavior of MiCall is highly automated to ensure that results are generated without the need for manual intervention. This automation allows researchers to focus on interpreting sequencing data rather than managing the pipeline itself. This is particularly beneficial for labs with high throughput, providing timely and consistent data processing.

MiCall programmatically interacts with several systems:

  • QAI - our general Laboratory Information Management System. Files that define individual runs (these are the SampleSheet.xml files) are produced via QAI’s graphical interface and placed in a specific network location (/MiSeq/runs/ directory on the network-mounted RAW_DATA drive) monitored by MiCall. Additionally, MiCall utilizes QAI’s web server REST interface to update the database with new run results.

  • CFE-scripts - a collection of scripts responsible for producing resistance interpretation reports. These scripts monitor available and unprocessed results produced by MiCall, extract and reshape them, and upload them to the laboratory’s database. More specifically, the miseq_gen_results.rb script polls MiCall’s resistance interpretation scores, stored in the same location as all other inputs and outputs, and uploads all fresh ones.

  • Kive - our platform for version control of bioinformatic pipelines. MiCall uses the Python interface of Kive to initiate new jobs, upload inputs, and download processing results.

  • MiSeq Hardware - the physical machines that perform sequencing for the laboratory. Internally, MiCall consistently monitors specific network locations for new data from the MiSeq sequencers. This data, in the form of FASTQ files (and supporting files, like those containing the read quality information), triggers MiCall to interact with Kive to initiate and manage analysis workflows.

MiCall incorporates several features that enhance the quality and reliability of sequence analysis. It can be executed using Docker for simpler setups or via Singularity containers within the Kive platform for production use. Singularity provides lightweight and reproducible environments suited for high-performance computing contexts. Combined with Kive, this ensures that MiCall can operate efficiently at scale, facilitating reliable and scalable deployment.