modalysis

modalysis is a pipeline-oriented toolkit for methylation and DMR analysis. It exposes a CLI, a FastAPI server, and reusable Python modules.

All user-facing operations run through this stack:

CLI parser -> CLI handler -> HTTP client -> FastAPI server -> core function

Prerequisites

  • Python 3.13+

  • uv

Install dependencies:

uv sync

Required Input Types

The pipeline expects these input categories:

  • GFF annotation file (for gene coordinates and descriptions)

  • Pileup .bed files (per sample/modification)

  • DMR .bed files

  • Expression TSV files (GENE_ID<TAB>STATUS, such as UP, DOWN, NDE)

  • Allowed chromosomes file (one chromosome name per line)

Output Types

  • Tabular command outputs: .modalysis (TSV)

  • Plot command outputs: .png

  • Optional dmr gene-counts --output-excel: .xlsx

Command Reference

Default server port: 8000.

modalysis server

Purpose: Start the FastAPI server used by all analysis commands.

Algorithm:

  • Launches fastapi run (or fastapi dev with --dev) against src/modalysis/server/main.py.

Usage:

uv run modalysis server [--port 8000] [--dev]

Parameters:

Flag

Required

Default

Description

--port

No

8000

Server port.

--dev

No

False

Enables autoreload development mode.

Output:

  • Running HTTP server (no .modalysis file).


modalysis gff format

Purpose: Normalize a raw GFF into the pipeline’s compact .modalysis gene table.

Algorithm:

  • Reads TSV rows from the source GFF.

  • Keeps only rows with exactly 9 columns.

  • Keeps only protein_coding_gene features.

  • Filters to chromosomes present in --allowed-chromosomes.

  • Converts start coordinate to zero-based (start - 1).

  • Extracts ID and description from attributes.

  • Writes columns: CHROMOSOME, START, END, GENE_ID, DESCRIPTION.

Usage:

uv run modalysis gff format \
  --input-path /path/to/input.gff \
  --output-path /path/to/output_dir \
  --output-name formatted_gff \
  --allowed-chromosomes /path/to/allowed_chromosomes.txt \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--input-path

Yes

-

Input GFF path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename (.modalysis appended).

--allowed-chromosomes

Yes

-

File with one valid chromosome per line.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/formatted_gff.modalysis


modalysis gff annotate

Purpose: Attach expression status labels to each GFF gene row.

Algorithm:

  • Loads each expression TSV as {GENE_ID -> STATUS}.

  • For every gene in formatted GFF, looks up each expression source.

  • Writes joined annotations like LABEL: VALUE; LABEL2: VALUE2 into EXPRESSION.

Usage:

uv run modalysis gff annotate \
  --gff-path /path/to/formatted_gff.modalysis \
  --expression-paths /path/to/expr_a.tsv /path/to/expr_b.tsv \
  --expression-labels tissue_a tissue_b \
  --output-path /path/to/output_dir \
  --output-name annotated_gff \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--gff-path

Yes

-

Formatted GFF .modalysis path.

--expression-paths

Yes

-

One or more expression TSV files.

--expression-labels

Yes

-

Label per expression file (same order).

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/annotated_gff.modalysis with added EXPRESSION column.


modalysis pileup format

Purpose: Normalize raw pileup records into a minimal .modalysis representation.

Algorithm:

  • Reads raw pileup rows.

  • Keeps only rows with exactly 18 columns.

  • Filters by allowed chromosomes.

  • Extracts columns for genomic key and counts.

  • Writes columns: CHROMOSOME, START, END, MODIFICATION, N_VALID_COV, N_MOD.

Usage:

uv run modalysis pileup format \
  --input-path /path/to/raw_pileup.bed \
  --output-path /path/to/output_dir \
  --output-name sample_mod \
  --allowed-chromosomes /path/to/allowed_chromosomes.txt \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--input-path

Yes

-

Raw pileup file path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--allowed-chromosomes

Yes

-

File with one valid chromosome per line.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/sample_mod.modalysis


modalysis pileup merge

Purpose: Aggregate multiple formatted pileup files by genomic key.

Algorithm:

  • Uses key (CHROMOSOME, START, END, MODIFICATION).

  • Sums N_VALID_COV and N_MOD across files.

  • Tracks in how many files each key appears.

  • Filters keys using:

    • minimum file count (--min-files)

    • minimum file coverage percentage (--min-file-coverage)

    • minimum total reads (--min-reads)

  • Writes merged rows with TOTAL_FILES and N_FILES.

Usage:

uv run modalysis pileup merge \
  --pileup-paths /path/to/a.modalysis /path/to/b.modalysis \
  --output-path /path/to/output_dir \
  --output-name merged_mod \
  [--min-files 2] \
  [--min-file-coverage 50.0] \
  [--min-reads 5] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--pileup-paths

Yes

-

Formatted pileup inputs to merge.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--min-files

No

2

Minimum files containing key.

--min-file-coverage

No

50.0

Minimum % of files containing key.

--min-reads

No

5

Minimum summed N_VALID_COV.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/merged_mod.modalysis


modalysis dmr format

Purpose: Filter and normalize raw DMR rows into a consistent .modalysis table.

Algorithm:

  • Reads raw DMR rows.

  • Keeps only rows with exactly 23 columns.

  • Filters by allowed chromosomes.

  • Applies thresholds on score, p-value, sample percentages, and read counts.

  • Writes retained rows with columns: CHROMOSOME, START, END, SCORE, MAP_BASED_P_VALUE, EFFECT_SIZE, PCT_A_SAMPLES, PCT_B_SAMPLES.

Usage:

uv run modalysis dmr format \
  --input-path /path/to/raw_dmr.bed \
  --output-path /path/to/output_dir \
  --output-name dmr_formatted \
  --allowed-chromosomes /path/to/allowed_chromosomes.txt \
  [--min-score 5] \
  [--max-p-value 0.05] \
  [--min-pct-a-samples 50.0] \
  [--min-pct-b-samples 50.0] \
  [--min-reads 5] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--input-path

Yes

-

Raw DMR file path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--allowed-chromosomes

Yes

-

File with one valid chromosome per line.

--min-score

No

5

Keep rows with score >= this value.

--max-p-value

No

0.05

Keep rows with p-value <= this value.

--min-pct-a-samples

No

50.0

Minimum % A-group samples.

--min-pct-b-samples

No

50.0

Minimum % B-group samples.

--min-reads

No

5

Minimum read count in both groups.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/dmr_formatted.modalysis


modalysis dmr annotate

Purpose: Annotate each formatted DMR interval with overlapping gene regions.

Algorithm:

  • Parses formatted GFF genes by chromosome.

  • Builds promoter/body/enhancer regions per gene.

  • For each DMR interval, finds overlapping genes in each region.

  • Appends columns PROMOTER, BODY, ENHANCER (comma-separated gene IDs).

Usage:

uv run modalysis dmr annotate \
  --dmr-path /path/to/dmr_formatted.modalysis \
  --gff-path /path/to/formatted_gff.modalysis \
  --output-path /path/to/output_dir \
  --output-name dmr_annotated \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--dmr-path

Yes

-

Formatted DMR .modalysis path.

--gff-path

Yes

-

Formatted GFF .modalysis path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/dmr_annotated.modalysis


modalysis dmr gene-counts

Purpose: Count unique genes by manifestation, expression profile, effect sign, modification, and region.

Algorithm:

  • Validates all list arguments have compatible lengths.

  • Loads expression mapping from annotated GFF EXPRESSION field.

  • Reads annotated DMR files and groups genes by: (manifestation, expression_profile, effect_sign, modification, region).

  • Uses unique gene sets to avoid duplicate counts.

  • Writes TSV summary rows.

  • Optional: writes grouped-header Excel workbook (--output-excel).

Usage:

uv run modalysis dmr gene-counts \
  --annotated-dmr-paths /path/to/a.modalysis /path/to/b.modalysis \
  --manifestations M1 M1 \
  --modifications 5MC 5MC_5HMC \
  --manifestation-labels M1 \
  --expression-labels tissue_1 \
  --annotated-gff-path /path/to/gff_annotated.modalysis \
  --output-path /path/to/output_dir \
  --output-name gene_counts \
  [--output-excel] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--annotated-dmr-paths

Yes

-

Annotated DMR inputs.

--manifestations

Yes

-

Manifestation label per DMR input.

--modifications

Yes

-

Modification label per DMR input.

--manifestation-labels

Yes

-

Canonical manifestation labels used for expression matching.

--expression-labels

Yes

-

Expression labels mapped to --manifestation-labels in order.

--annotated-gff-path

Yes

-

Annotated GFF with EXPRESSION column.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--output-excel

No

False

Also write .xlsx workbook.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/gene_counts.modalysis

  • Optional: /path/to/output_dir/gene_counts.xlsx


modalysis dmr common-genes

Purpose: Find genes shared between two modifications for each manifestation and region.

Algorithm:

  • Validates list lengths and that modification A/B differ.

  • Loads gene expression status from annotated GFF.

  • From annotated DMRs, collects genes from negative effect-size rows only.

  • For each manifestation and region, computes set intersection between modification A and B.

  • Writes summary rows and per-gene rows including expression status.

Usage:

uv run modalysis dmr common-genes \
  --annotated-dmr-paths /path/to/a.modalysis /path/to/b.modalysis \
  --manifestations M1 M1 \
  --modifications 5MC 5MC_5HMC \
  --manifestation-labels M1 \
  --expression-labels tissue_1 \
  --modification-a 5MC \
  --modification-b 5MC_5HMC \
  --annotated-gff-path /path/to/gff_annotated.modalysis \
  --output-path /path/to/output_dir \
  --output-name common_genes \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--annotated-dmr-paths

Yes

-

Annotated DMR inputs.

--manifestations

Yes

-

Manifestation label per DMR input.

--modifications

Yes

-

Modification label per DMR input.

--manifestation-labels

Yes

-

Canonical manifestation labels used for expression matching.

--expression-labels

Yes

-

Expression labels mapped to manifestations by order.

--modification-a

Yes

-

First modification for intersection.

--modification-b

Yes

-

Second modification for intersection.

--annotated-gff-path

Yes

-

Annotated GFF with expression data.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/common_genes.modalysis


modalysis plot mean-methylation

Purpose: Plot mean methylation by chromosome, grouped by region (promoter/body/enhancer).

Algorithm:

  • Builds gene regions from formatted GFF.

  • For each merged pileup file, accumulates N_MOD / N_VALID_COV by chromosome and region.

  • Draws line plots across region-partitioned x-axis.

  • Supports optional chromosome ordering and custom title.

Usage:

uv run modalysis plot mean-methylation \
  --gff-path /path/to/formatted_gff.modalysis \
  --merged-pileup-paths /path/to/m1.modalysis /path/to/m2.modalysis \
  --labels 5MC 5MC_5HMC \
  --output-path /path/to/output_dir \
  --output-name mean_methylation \
  [--y-min 0.0] \
  [--y-max 0.1] \
  [--chromosome-order-path /path/to/order.txt] \
  [--plot-title "Custom Title"] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--gff-path

Yes

-

Formatted GFF .modalysis.

--merged-pileup-paths

Yes

-

One or more merged pileup .modalysis files.

--labels

Yes

-

Display label per merged pileup path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename (.png).

--y-min

No

0.0

Y-axis lower bound.

--y-max

No

0.1

Y-axis upper bound.

--chromosome-order-path

No

None

Optional chromosome ordering file.

--plot-title

No

None

Optional plot title override.

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/mean_methylation.png


modalysis plot gene-heatmap

Purpose: Generate gene-level heatmaps for manifestation/expression/effect-sign/modification combinations.

Algorithm:

  • Builds manifestation->expression label mapping.

  • Loads per-gene expression from annotated GFF.

  • Collects genes per combination from annotated DMRs.

  • Accumulates per-gene methylation means from merged pileups.

  • Renders one heatmap per non-empty combination with shared color scale.

Usage:

uv run modalysis plot gene-heatmap \
  --annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
  --manifestations M1 M1 \
  --modifications 5MC 5MC_5HMC \
  --manifestation-labels M1 \
  --expression-labels tissue_1 \
  --annotated-gff-path /path/to/gff_annotated.modalysis \
  --gff-path /path/to/formatted_gff.modalysis \
  --merged-pileup-paths /path/to/p1.modalysis /path/to/p2.modalysis \
  --pileup-manifestations M1 M1 \
  --pileup-modifications 5MC 5MC_5HMC \
  --output-path /path/to/output_dir \
  --output-name heatmap \
  [--show-gene-labels] \
  [--effect-signs NEGATIVE NON_NEGATIVE] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--annotated-dmr-paths

Yes

-

Annotated DMR inputs.

--manifestations

Yes

-

Manifestation label per DMR input.

--modifications

Yes

-

Modification label per DMR input.

--manifestation-labels

Yes

-

Canonical manifestation labels.

--expression-labels

Yes

-

Expression labels aligned to manifestation labels.

--annotated-gff-path

Yes

-

Annotated GFF with EXPRESSION.

--gff-path

Yes

-

Formatted GFF for gene coordinates.

--merged-pileup-paths

Yes

-

Merged pileup inputs.

--pileup-manifestations

Yes

-

Manifestation label per merged pileup path.

--pileup-modifications

Yes

-

Modification label per merged pileup path.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output prefix (<prefix>_<combo>.png).

--show-gene-labels

No

False

Show gene IDs on y-axis.

--effect-signs

No

both

Restrict to NEGATIVE and/or NON_NEGATIVE.

--port

No

8000

Server port.

Output:

  • Multiple PNG files like /path/to/output_dir/heatmap_<...>.png


modalysis plot dmr-dotplot

Purpose: Plot DMR positions within promoter/body/enhancer panels for each gene.

Algorithm:

  • Loads expression states and gene coordinates.

  • Converts each DMR to region-relative position:

    • promoter: distance to gene start

    • body: percent through gene body

    • enhancer: distance from gene end

  • Groups positions by manifestation/expression/effect-sign/modification/gene.

  • Renders one 3-panel dotplot per non-empty combination.

  • Draws consensus windows containing many distinct genes.

Usage:

uv run modalysis plot dmr-dotplot \
  --annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
  --manifestations M1 M1 \
  --modifications 5MC 5MC_5HMC \
  --manifestation-labels M1 \
  --expression-labels tissue_1 \
  --annotated-gff-path /path/to/gff_annotated.modalysis \
  --gff-path /path/to/formatted_gff.modalysis \
  --output-path /path/to/output_dir \
  --output-name dotplot \
  [--show-gene-labels] \
  [--effect-signs NEGATIVE NON_NEGATIVE] \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--annotated-dmr-paths

Yes

-

Annotated DMR inputs.

--manifestations

Yes

-

Manifestation label per DMR input.

--modifications

Yes

-

Modification label per DMR input.

--manifestation-labels

Yes

-

Canonical manifestation labels.

--expression-labels

Yes

-

Expression labels aligned to manifestation labels.

--annotated-gff-path

Yes

-

Annotated GFF with EXPRESSION.

--gff-path

Yes

-

Formatted GFF for coordinates.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output prefix (<prefix>_<combo>.png).

--show-gene-labels

No

False

Show gene IDs.

--effect-signs

No

both

Restrict to NEGATIVE and/or NON_NEGATIVE.

--port

No

8000

Server port.

Output:

  • Multiple PNG files like /path/to/output_dir/dotplot_<...>.png


modalysis plot common-genes-venn

Purpose: Plot Venn diagrams of common negative-DMR genes for two modifications.

Algorithm:

  • From annotated DMR inputs, keeps only rows with negative effect size.

  • Collects gene sets by (manifestation, modification, region).

  • For each manifestation and each region, draws set overlap panel for modification A vs B.

Usage:

uv run modalysis plot common-genes-venn \
  --annotated-dmr-paths /path/to/dmr1.modalysis /path/to/dmr2.modalysis \
  --manifestations M1 M1 \
  --modifications 5MC 5MC_5HMC \
  --modification-a 5MC \
  --modification-b 5MC_5HMC \
  --output-path /path/to/output_dir \
  --output-name common_venn \
  [--port 8000]

Parameters:

Flag

Required

Default

Description

--annotated-dmr-paths

Yes

-

Annotated DMR inputs.

--manifestations

Yes

-

Manifestation label per DMR input.

--modifications

Yes

-

Modification label per DMR input.

--modification-a

Yes

-

First modification to compare.

--modification-b

Yes

-

Second modification to compare.

--output-path

Yes

-

Output directory.

--output-name

Yes

-

Output basename (.png).

--port

No

8000

Server port.

Output:

  • /path/to/output_dir/common_venn.png

Troubleshooting

  • ConnectionError / request failures:

    • Ensure uv run modalysis server is running on the same port passed to command --port.

  • Validation errors about list lengths:

    • In DMR/plot aggregation commands, ensure paired list arguments have matching lengths and consistent ordering.

  • Empty/near-empty outputs:

    • Relax thresholds such as --min-score, --max-p-value, --min-file-coverage, --min-reads.

    • Verify chromosome naming in input files matches your allowed chromosome list.

  • ValueError: Modification A and B must be different:

    • Use distinct values for --modification-a and --modification-b.

Testing

Run the full suite:

uv run pytest -q

Run with coverage:

uv run pytest --cov=modalysis --cov-report=term-missing

Run focused suites:

uv run pytest tests/core -q
uv run pytest tests/server -q
uv run pytest tests/client -q
uv run pytest tests/cli -q
uv run pytest tests/e2e -q

Build Docs

Build the Sphinx site:

uv run sphinx-build -b html docs docs/_build/html

Open docs/_build/html/index.html in a browser.

pnpm wrangler pages deploy docs/_build/html --project-name modalysis