Load and prepare the Module 1 multi-omic object
Source:R/utils_step1_pipeline_helpers.R
load_prep_multiomic_data.RdBuild the rebuilt Module 1 data object from cached aligned footprints or from raw footprint overview files plus ATAC, RNA, and sample metadata inputs. The returned object is the canonical input for downstream Step 1 TFBS correlation.
Usage
load_prep_multiomic_data(
config = NULL,
genome = NULL,
gene_symbol_col = "HGNC",
fp_aligned = NULL,
do_preprocess = FALSE,
do_motif_clustering = FALSE,
trim_hocomoco = FALSE,
fp_root_dir = NULL,
fp_cache_dir = NULL,
fp_cache_tag = NULL,
footprint_sample_scope = "metadata",
mid_slop = 10L,
round_digits = 1L,
score_match_pct = 0.8,
output_mode = c("full", "distinct"),
write_outputs = FALSE,
write_fp_score_qn_csv = TRUE,
atac_data = NULL,
rna_tbl = NULL,
metadata = NULL,
atac_data_path = NULL,
rna_path = NULL,
metadata_path = NULL,
step1_out_dir_name = "predict_tf_binding_sites",
label_col,
expected_n = NULL,
tf_list = NULL,
motif_db = NULL,
threshold_gene_expr = NULL,
threshold_fp_score = NULL,
use_parallel = TRUE,
verbose = TRUE,
time_log = verbose
)Arguments
- config
Optional YAML config path.
- genome
Optional genome string used to override the config value.
- gene_symbol_col
Gene-symbol column in the RNA table.
- fp_aligned
Optional pre-aligned footprint object.
- do_preprocess
Logical; if `TRUE`, load and align raw footprints before building the object. If `FALSE`, use cached aligned footprints.
- do_motif_clustering
Logical; if `TRUE`, run motif clustering during preprocessing when available.
- trim_hocomoco
Logical; trim HOCOMOCO manifests when the trimming helper is available.
- fp_root_dir
Optional root directory for raw footprint overview files.
- fp_cache_dir
Cache directory for aligned footprint files.
- fp_cache_tag
Cache tag, typically the motif database name.
- footprint_sample_scope
Footprint sample selection rule.
- mid_slop, round_digits, score_match_pct
Alignment parameters passed to `align_footprints()`.
- output_mode
Output mode for aligned footprints. One of `"full"` or `"distinct"`.
- write_outputs
Logical; if `TRUE`, save the prepared object as an RDS cache under `predict_tf_binding_sites/`.
- write_fp_score_qn_csv
Logical; if `TRUE` and `write_outputs = TRUE`, also save quantile-normalized footprint scores as `01_fp_scores_qn_<db>.csv` under the Module 1 output directory.
- atac_data, rna_tbl, metadata
Optional in-memory input tables.
- atac_data_path, rna_path, metadata_path
Optional explicit file paths for the input tables.
- step1_out_dir_name
Output folder name under `base_dir`.
- label_col
Metadata column used to aggregate matched conditions.
- expected_n
Optional expected matched sample count.
- tf_list
Optional TF allowlist for downstream correlation.
- motif_db
Optional motif metadata table.
- threshold_gene_expr
Expression threshold for Step 1 expression flags.
- threshold_fp_score
Footprint-score threshold for Step 1 bound flags.
- use_parallel
Logical; if `TRUE`, allow parallel work in supported helpers.
- verbose
Logical; if `TRUE`, emit concise progress messages.
- time_log
Logical; if TRUE, emit elapsed-time messages.