Skip to content

Build the rebuilt Module 1 data object from cached aligned footprints or from raw footprint overview files plus ATAC, RNA, and sample metadata inputs. The returned object is the canonical input for downstream Step 1 TFBS correlation.

Usage

load_prep_multiomic_data(
  config = NULL,
  genome = NULL,
  gene_symbol_col = "HGNC",
  fp_aligned = NULL,
  do_preprocess = FALSE,
  do_motif_clustering = FALSE,
  trim_hocomoco = FALSE,
  fp_root_dir = NULL,
  fp_cache_dir = NULL,
  fp_cache_tag = NULL,
  footprint_sample_scope = "metadata",
  mid_slop = 10L,
  round_digits = 1L,
  score_match_pct = 0.8,
  output_mode = c("full", "distinct"),
  write_outputs = FALSE,
  write_fp_score_qn_csv = TRUE,
  atac_data = NULL,
  rna_tbl = NULL,
  metadata = NULL,
  atac_data_path = NULL,
  rna_path = NULL,
  metadata_path = NULL,
  step1_out_dir_name = "predict_tf_binding_sites",
  label_col,
  expected_n = NULL,
  tf_list = NULL,
  motif_db = NULL,
  threshold_gene_expr = NULL,
  threshold_fp_score = NULL,
  use_parallel = TRUE,
  verbose = TRUE,
  time_log = verbose
)

Arguments

config

Optional YAML config path.

genome

Optional genome string used to override the config value.

gene_symbol_col

Gene-symbol column in the RNA table.

fp_aligned

Optional pre-aligned footprint object.

do_preprocess

Logical; if `TRUE`, load and align raw footprints before building the object. If `FALSE`, use cached aligned footprints.

do_motif_clustering

Logical; if `TRUE`, run motif clustering during preprocessing when available.

trim_hocomoco

Logical; trim HOCOMOCO manifests when the trimming helper is available.

fp_root_dir

Optional root directory for raw footprint overview files.

fp_cache_dir

Cache directory for aligned footprint files.

fp_cache_tag

Cache tag, typically the motif database name.

footprint_sample_scope

Footprint sample selection rule.

mid_slop, round_digits, score_match_pct

Alignment parameters passed to `align_footprints()`.

output_mode

Output mode for aligned footprints. One of `"full"` or `"distinct"`.

write_outputs

Logical; if `TRUE`, save the prepared object as an RDS cache under `predict_tf_binding_sites/`.

write_fp_score_qn_csv

Logical; if `TRUE` and `write_outputs = TRUE`, also save quantile-normalized footprint scores as `01_fp_scores_qn_<db>.csv` under the Module 1 output directory.

atac_data, rna_tbl, metadata

Optional in-memory input tables.

atac_data_path, rna_path, metadata_path

Optional explicit file paths for the input tables.

step1_out_dir_name

Output folder name under `base_dir`.

label_col

Metadata column used to aggregate matched conditions.

expected_n

Optional expected matched sample count.

tf_list

Optional TF allowlist for downstream correlation.

motif_db

Optional motif metadata table.

threshold_gene_expr

Expression threshold for Step 1 expression flags.

threshold_fp_score

Footprint-score threshold for Step 1 bound flags.

use_parallel

Logical; if `TRUE`, allow parallel work in supported helpers.

verbose

Logical; if `TRUE`, emit concise progress messages.

time_log

Logical; if TRUE, emit elapsed-time messages.

Value

A rebuilt Module 1 multi-omic object.

Examples

if (FALSE) { # \dontrun{
omics_data <- load_prep_multiomic_data(
  config = "dev/config/pdac_nutrient_stress_strict_jaspar2024_demo.yaml",
  genome = "hg38",
  label_col = "strict_match_rna",
  do_preprocess = FALSE,
  verbose = TRUE
)
} # }