Skip to content

Run the Module 1 TFBS workflow as one user-facing operation. The function first uses motif-supported FP-TF correlations to define high-confidence footprints, then predicts sparse FP-TF binding events for expressed TFs.

Usage

predict_tfbs(
  omics_data,
  out_dir = "predict_tf_binding_sites",
  db = "JASPAR2024",
  label_col = NULL,
  r_cutoff = 0.3,
  p_cutoff = NULL,
  fdr_cutoff = NULL,
  filter_to_canonical_bound = TRUE,
  tf_subset = NULL,
  write_outputs = TRUE,
  write_stats = FALSE,
  write_bed = FALSE,
  write_qc_report = TRUE,
  qc_report_scan = FALSE,
  output_format = c("csv", "parquet", "auto"),
  return_prediction_stats = NULL,
  prediction_return_limit = getOption("craftgrn.module1_prediction_return_limit", 5e+06),
  min_non_na = 3L,
  cores = NULL,
  verbose = TRUE,
  time_log = verbose
)

Arguments

omics_data

CraftGRN multiomic object returned by `load_prep_multiomic_data()`.

out_dir

Output directory.

db

Motif database label used in output metadata.

label_col

Metadata column used to build condition-level matrices when missing from `omics_data`.

r_cutoff

Minimum positive correlation used for motif-supported and prediction calls.

p_cutoff

Optional best-method p-value cutoff. If `NULL`, p-value filtering is disabled.

fdr_cutoff

Optional best-method adjusted p-value cutoff. If `NULL`, FDR filtering is disabled.

filter_to_canonical_bound

Logical; if `TRUE`, only footprints with at least one motif-supported TF passing the cutoffs are used for the all-expressed-TF prediction stage.

tf_subset

Optional TF subset.

write_outputs

Write Module 1 output files.

write_stats

Retain and write full FP-TF correlation statistics.

write_bed

Write optional BED-like browser files for high-confidence footprints and in-memory TFBS prediction statistics.

write_qc_report

Write a Module 1 HTML QC report when outputs are written.

qc_report_scan

Scan predicted TFBS chunks for top-TF summaries in the QC report.

output_format

Output format for large streamed TFBS prediction statistic chunks.

return_prediction_stats

Return the TFBS prediction statistic table in memory. If `NULL`, large output-writing runs are streamed to disk and return a manifest.

prediction_return_limit

Maximum number of predicted events to keep in memory when `return_prediction_stats = NULL` and `write_outputs = TRUE`.

min_non_na

Minimum finite condition pairs required for correlation.

cores

Number of worker cores for the dense prediction correlation step. If `NULL`, use available cores.

verbose

Emit concise progress messages.

time_log

Logical; if TRUE, emit elapsed-time messages.

Value

A list containing `omics_data`, `high_confidence_footprints`, `motif_supported_correlations`, `prediction_stats`, `prediction_stats`, `reports`, and `parameters`.