Core Modules

Chromosome normalization helpers.

modalysis.core.chromosomes.normalize_allowed_chromosomes(allowed_chromosomes: list[str]) set[str][source]

Normalize chromosome names to uppercase for case-insensitive filtering.

Parameters:

allowed_chromosomes (list[str])

Return type:

set[str]

Expression field parsing utilities.

modalysis.core.expression.parse_expression_field(expression_field: str) dict[str, str][source]

Parse LABEL: VALUE; … expression text into uppercase mapping.

Parameters:

expression_field (str)

Return type:

dict[str, str]

Gene-region parsing and interval lookup helpers.

class modalysis.core.gene_regions.ChromosomeRegions[source]

Bases: TypedDict

promoter: list[tuple[int, int, str]]
body: list[tuple[int, int, str]]
enhancer: list[tuple[int, int, str]]
promoter_starts: list[int]
body_starts: list[int]
enhancer_starts: list[int]
modalysis.core.gene_regions.parse_gff(gff_path: str) dict[str, list[tuple[int, int, str]]][source]

Parse formatted GFF .modalysis file.

Returns a dict: chromosome -> sorted list of (start, end, gene_id).

Parameters:

gff_path (str)

Return type:

dict[str, list[tuple[int, int, str]]]

modalysis.core.gene_regions.build_gene_regions(genes_by_chromosome: dict[str, list[tuple[int, int, str]]], promoter_upstream: int = 1000, enhancer_downstream: int = 1000) dict[str, ChromosomeRegions][source]

Build promoter/body/enhancer region boundaries for annotation lookup.

Parameters:
  • genes_by_chromosome (dict[str, list[tuple[int, int, str]]])

  • promoter_upstream (int)

  • enhancer_downstream (int)

Return type:

dict[str, ChromosomeRegions]

modalysis.core.gene_regions.find_genes_at_position(position: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) list[str][source]

Find all gene IDs whose region contains the given position.

Parameters:
  • position (int)

  • region_list (list[tuple[int, int, str]])

  • starts_list (list[int])

Return type:

list[str]

modalysis.core.gene_regions.find_genes_overlapping_interval(interval_start: int, interval_end: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) list[str][source]

Find all gene IDs whose region overlaps the given half-open interval [start, end).

Parameters:
  • interval_start (int)

  • interval_end (int)

  • region_list (list[tuple[int, int, str]])

  • starts_list (list[int])

Return type:

list[str]

Core GFF formatting and expression annotation routines.

modalysis.core.gff.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str]) None[source]

Format raw GFF rows into gene-level .modalysis output.

Parameters:
  • input_path (str)

  • output_path (str)

  • output_name (str)

  • allowed_chromosomes (list[str])

Return type:

None

modalysis.core.gff.annotate(gff_path: str, expression_paths: list[str], expression_labels: list[str], output_path: str, output_name: str) None[source]

Annotate formatted GFF genes with one or more expression sources.

Parameters:
  • gff_path (str)

  • expression_paths (list[str])

  • expression_labels (list[str])

  • output_path (str)

  • output_name (str)

Return type:

None

Core pileup formatting and merge routines.

modalysis.core.pileup.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str]) None[source]

Format a raw pileup file into canonical .modalysis columns.

Parameters:
  • input_path (str)

  • output_path (str)

  • output_name (str)

  • allowed_chromosomes (list[str])

Return type:

None

modalysis.core.pileup.merge(pileup_paths: list[str], output_path: str, output_name: str, min_files: int = 2, min_file_coverage: float = 50.0, min_reads: int = 5) None[source]

Merge formatted pileup files by genomic key and apply coverage/read filters.

Parameters:
  • pileup_paths (list[str])

  • output_path (str)

  • output_name (str)

  • min_files (int)

  • min_file_coverage (float)

  • min_reads (int)

Return type:

None

Core DMR formatting, annotation, and aggregation routines.

modalysis.core.dmr._to_excel_column_name(column_index: int) str[source]

Convert 1-based integer column index to Excel letter notation.

Parameters:

column_index (int)

Return type:

str

modalysis.core.dmr._excel_inline_string_cell(row: int, column: int, value: str) str[source]

Build XML for an inline string cell in an XLSX worksheet.

Parameters:
  • row (int)

  • column (int)

  • value (str)

Return type:

str

modalysis.core.dmr._excel_number_cell(row: int, column: int, value: int) str[source]

Build XML for a numeric cell in an XLSX worksheet.

Parameters:
  • row (int)

  • column (int)

  • value (int)

Return type:

str

modalysis.core.dmr._write_gene_counts_excel(output_path_with_name: Path, manifestation_order: list[str], modification_order: list[str], count_lookup: dict[tuple[str, str, str, str, str], int]) None[source]

Write a compact XLSX workbook for aggregated DMR gene counts.

Parameters:
  • output_path_with_name (Path)

  • manifestation_order (list[str])

  • modification_order (list[str])

  • count_lookup (dict[tuple[str, str, str, str, str], int])

Return type:

None

modalysis.core.dmr.format(input_path: str, output_path: str, output_name: str, allowed_chromosomes: list[str], min_score: float = 5, max_p_value: float = 0.05, min_pct_a_samples: float = 50.0, min_pct_b_samples: float = 50.0, min_reads: int = 5) None[source]

Filter and normalize raw DMR rows into the .modalysis schema.

Parameters:
  • input_path (str)

  • output_path (str)

  • output_name (str)

  • allowed_chromosomes (list[str])

  • min_score (float)

  • max_p_value (float)

  • min_pct_a_samples (float)

  • min_pct_b_samples (float)

  • min_reads (int)

Return type:

None

modalysis.core.dmr.annotate(dmr_path: str, gff_path: str, output_path: str, output_name: str) None[source]

Annotate DMR intervals with overlapping promoter/body/enhancer gene IDs.

Parameters:
  • dmr_path (str)

  • gff_path (str)

  • output_path (str)

  • output_name (str)

Return type:

None

modalysis.core.dmr.gene_counts(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, output_path: str, output_name: str, output_excel: bool = False) None[source]

Count unique genes by manifestation/expression/effect/modification/region.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_labels (list[str])

  • expression_labels (list[str])

  • annotated_gff_path (str)

  • output_path (str)

  • output_name (str)

  • output_excel (bool)

Return type:

None

modalysis.core.dmr.common_genes(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], modification_a: str, modification_b: str, annotated_gff_path: str, output_path: str, output_name: str) None[source]

Find common negative-effect genes across two modifications by region.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_labels (list[str])

  • expression_labels (list[str])

  • modification_a (str)

  • modification_b (str)

  • annotated_gff_path (str)

  • output_path (str)

  • output_name (str)

Return type:

None

Formatting helpers for plot labels.

modalysis.core.plots.label_format.format_modification_label(modification: str) str[source]

Convert normalized modification labels into human-readable labels.

Parameters:

modification (str)

Return type:

str

Mean methylation line-plot generation across regions and chromosomes.

modalysis.core.plots.mean_methylation._find_overlapping_regions(position: int, region_list: list[tuple[int, int, str]], starts_list: list[int]) bool[source]

Check if a position overlaps with any regions using binary search.

Returns True if the position falls within at least one region. A position overlaps a region if region_start <= position < region_end.

Parameters:
  • position (int)

  • region_list (list[tuple[int, int, str]])

  • starts_list (list[int])

Return type:

bool

modalysis.core.plots.mean_methylation._accumulate_pileup(merged_pileup_path: str, regions: dict[str, ChromosomeRegions]) dict[tuple[str, str], list[int]][source]

Read a merged pileup file and accumulate n_valid_cov and n_mod per (chromosome, region).

Returns:

(chromosome, region_name) -> [sum_n_valid_cov, sum_n_mod]

Return type:

dict

Parameters:
modalysis.core.plots.mean_methylation.plot_mean_methylation(gff_path: str, merged_pileup_paths: list[str], labels: list[str], output_path: str, output_name: str, y_min: float = 0.0, y_max: float = 0.1, chromosome_order: list[str] | None = None, plot_title: str | None = None) None[source]

Generate region-grouped chromosome methylation line plots.

Parameters:
  • gff_path (str)

  • merged_pileup_paths (list[str])

  • labels (list[str])

  • output_path (str)

  • output_name (str)

  • y_min (float)

  • y_max (float)

  • chromosome_order (list[str] | None)

  • plot_title (str | None)

Return type:

None

Gene-level methylation heatmap generation.

modalysis.core.plots.gene_heatmap._collect_genes_by_combination(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_to_expression_label: dict[str, str], gene_to_expression: dict[str, dict[str, str]]) dict[tuple[str, str, str, str, str], set[str]][source]

Read annotated DMR files and collect the set of genes for each (manifestation, expression_profile, effect_sign, modification, region) combination.

Returns:

key -> set of gene_ids

Return type:

dict

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_to_expression_label (dict[str, str])

  • gene_to_expression (dict[str, dict[str, str]])

modalysis.core.plots.gene_heatmap._accumulate_pileup_per_gene(merged_pileup_path: str, regions: dict[str, ChromosomeRegions]) dict[tuple[str, str], list[int]][source]

Read a merged pileup file and accumulate n_valid_cov and n_mod per (gene_id, region_name).

Returns:

(gene_id, region_name) -> [sum_n_valid_cov, sum_n_mod]

Return type:

dict

Parameters:
modalysis.core.plots.gene_heatmap.plot_gene_heatmap(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, gff_path: str, merged_pileup_paths: list[str], pileup_manifestations: list[str], pileup_modifications: list[str], output_path: str, output_name: str, show_gene_labels: bool = False, effect_signs: list[str] | None = None) None[source]

Render per-combination heatmaps using DMR-selected genes and pileup means.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_labels (list[str])

  • expression_labels (list[str])

  • annotated_gff_path (str)

  • gff_path (str)

  • merged_pileup_paths (list[str])

  • pileup_manifestations (list[str])

  • pileup_modifications (list[str])

  • output_path (str)

  • output_name (str)

  • show_gene_labels (bool)

  • effect_signs (list[str] | None)

Return type:

None

DMR position dotplot generation within promoter/body/enhancer regions.

modalysis.core.plots.dmr_dotplot._build_gene_coordinate_lookup(gff_path: str) dict[str, tuple[str, int, int]][source]

Build a lookup from gene_id -> (chromosome, start, end) using the formatted GFF file.

Returns:

gene_id (uppercase) -> (chromosome, start, end)

Return type:

dict

Parameters:

gff_path (str)

modalysis.core.plots.dmr_dotplot._collect_dmr_positions(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_to_expression_label: dict[str, str], gene_to_expression: dict[str, dict[str, str]], gene_coords: dict[str, tuple[str, int, int]]) dict[tuple[str, str, str, str, str, str], list[float]][source]

Read annotated DMR files and collect the position of each DMR within its gene region.

Returns:

(manifestation, expression_profile, effect_sign, modification, gene_id, region)

-> list of float positions For PROMOTER: distance from gene start (-1000 = far upstream, 0 = gene start) For BODY: percentage (0-100) For ENHANCER: distance from gene end (0-1000)

Return type:

dict

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_to_expression_label (dict[str, str])

  • gene_to_expression (dict[str, dict[str, str]])

  • gene_coords (dict[str, tuple[str, int, int]])

modalysis.core.plots.dmr_dotplot._find_consensus_window(region_points: list[tuple[float, str]], window_size: float, min_genes: int) tuple[float, float] | None[source]

Find a window containing points from at least min_genes distinct genes.

Parameters:
  • region_points (list[tuple[float, str]])

  • window_size (float)

  • min_genes (int)

Return type:

tuple[float, float] | None

modalysis.core.plots.dmr_dotplot._render_dotplot(gene_positions: dict[str, dict[str, list[float]]], title: str, output_file_path: Path, show_gene_labels: bool = False) bool[source]

Render a single dotplot PNG.

Parameters:
  • gene_positions (dict[str, dict[str, list[float]]]) – dict of gene_id -> {region -> [positions]} where region is PROMOTER, BODY, or ENHANCER

  • title (str) – plot title string

  • output_file_path (Path) – Path object for output file

  • show_gene_labels (bool)

Return type:

bool

modalysis.core.plots.dmr_dotplot.plot_dmr_dotplot(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], manifestation_labels: list[str], expression_labels: list[str], annotated_gff_path: str, gff_path: str, output_path: str, output_name: str, show_gene_labels: bool = False, effect_signs: list[str] | None = None) None[source]

Render DMR position dotplots for each manifestation/expression/modification slice.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • manifestation_labels (list[str])

  • expression_labels (list[str])

  • annotated_gff_path (str)

  • gff_path (str)

  • output_path (str)

  • output_name (str)

  • show_gene_labels (bool)

  • effect_signs (list[str] | None)

Return type:

None

Venn plotting for overlapping negative DMR genes across modifications.

modalysis.core.plots.common_genes_venn._collect_negative_gene_sets(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str]) tuple[dict[tuple[str, str, str], set[str]], list[str]][source]

Collect per-region gene sets from negative-effect DMR rows only.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

Return type:

tuple[dict[tuple[str, str, str], set[str]], list[str]]

modalysis.core.plots.common_genes_venn._draw_venn_panel(ax: Axes, set_a: set[str], set_b: set[str], label_a: str, label_b: str, title: str) None[source]

Draw one two-set Venn-like panel with counts and labels.

Parameters:
  • ax (Axes)

  • set_a (set[str])

  • set_b (set[str])

  • label_a (str)

  • label_b (str)

  • title (str)

Return type:

None

modalysis.core.plots.common_genes_venn.plot_common_genes_venn(annotated_dmr_paths: list[str], manifestations: list[str], modifications: list[str], modification_a: str, modification_b: str, output_path: str, output_name: str) None[source]

Render regional Venn panels comparing two modifications per manifestation.

Parameters:
  • annotated_dmr_paths (list[str])

  • manifestations (list[str])

  • modifications (list[str])

  • modification_a (str)

  • modification_b (str)

  • output_path (str)

  • output_name (str)

Return type:

None