How to Analyze ppmSeq Data
Analysis Pipeline for Ultima ppmSeq Data
Secondary Analysis Steps For ppmSeq Data
Trimming
The Ultima Trimmer software tool divides each read into segments, such as ppmSeq adapter sequence, Ultima barcode, and insert for downstream processing.
Alignment and Tagging
Reads are tagged (ppmSeq adapter sequence, Ultima barcode, and insert) and the insert sequence is aligned to a reference. For more information on Realignment: Ultima Aligner (UA) Github Page.
Demultiplexing
Reads are sorted, per sample, into folders based on Ultima barcodes.
Sample QC
A ppmSeq specific QC file is generated. Example: ppmSeq QC report,
File Output
Read data is output in a variant calling-ready CRAM file
Running Ultima analysis pipelines on AWS HealthOmics
Ultima Genomics offers pipelines as Ready2Run workflows on AWS HealthOmics. Ready2Run workflows enable you to run these pipelines on AWS HealthOmics by simply bringing your data. For more flexibility, such as the use of larger file sizes or changing the reference genome, you can convert Ready2Run workflows to private workflows.
To get started visit the Ultima Genomics repository for workflows compatible with AWS HealthOmics:
Each Ready2Run workflow folder contains the following:
Required WDL file(s)
“How To” documentation that details the workflow and how to run it externally of WDL.
Documentation of the WDL inputs and outputs
.json file that lists the parameters for workflow
Folder with optional input templates with default parameters for the WDL
Folder containing tasks the WDL is running
For more questions about these workflows, contact us via email: healthomics.support@ultimagen.com
Single Read SNV (SRSNV) Pipeline
The UG Single Read SNV (SRSNV) calling pipeline is a tool for assessing the quality of individual base substitutions, denoted SNVs for convenience, compared to the reference genome. Each SNV reported in a CRAM file is reported in a custom VCF file denoted as FeatureMap. A machine learning model is trained on these features to assign an SNV quality score (SNVQ). SNVQ is a more precise quality metric than base quality (BQ), calculated per ALT rather than aggregated across the three options. BQ, the standard substitution quality metric, measures the likelihood of any ALT at a given locus, which is inherently lossy. SNVQ, on the other hand, measures the likelihood of each ALT at a given locus, which is more precise.
For detailed instructions on implementing this workflow please visit: https://github.com/Ultimagen/healthomics-workflows/blob/main/workflows/single_read_snv/howto-single-read-snv.md#single-read-snv-srsnv-pipeline
Tumor-informed MRD with ppmSeq
The UG pipeline for tumor informed MRD measures the tumor fraction in cfDNA from the presence of tumor-specific SNVs. The input data is generally 3 aligned cram files:
cfDNA (plasma)
Tumor tissue (FFPE / FF)
Normal tissue (buffy coat / PBMCs)
The analysis of MRD data is composed of three parts:
Tumor signature mutation calling, where the tumor and normal tissues are used for finding the tumor somatic mutations signature with somatic variant calling (by default UG Somatic DeepVariant, though these can be provided from other callers).
Single Read SNV pipeline, where all the SNV candidates compared to the reference genome are extracted from the cfDNA cram file to a FeatureMap vcf, annotated and assigned a quality score (SNVQ).
Intersection and MRD data analysis, where the FeatureMap and signature are intersected and filtered, then reads supporting the tumor mutations are counted and a tumor fraction is measured. Control signatures can be added to estimate the background noise, e.g. from other cohort patients, and in addition control signatures are generated from a somatic mutation database.

For detailed instructions on implementing this workflow please visit: https://github.com/Ultimagen/healthomics-workflows/blob/main/workflows/mrd_featuremap/howto-mrd-wg-analysis.md