Predict transcription factor binding sites from matched footprint and RNA data
Source:R/utils_step1_predict_tfbs.R
predict_tfbs.RdRun the Module 1 TFBS workflow as one user-facing operation. The function first uses motif-supported FP-TF correlations to define high-confidence footprints, then predicts sparse FP-TF binding events for expressed TFs.
Usage
predict_tfbs(
omics_data,
out_dir = "predict_tf_binding_sites",
db = "JASPAR2024",
label_col = NULL,
r_cutoff = 0.3,
p_cutoff = NULL,
fdr_cutoff = NULL,
filter_to_canonical_bound = TRUE,
tf_subset = NULL,
write_outputs = TRUE,
write_stats = FALSE,
write_bed = FALSE,
write_qc_report = TRUE,
qc_report_scan = FALSE,
output_format = c("csv", "parquet", "auto"),
return_prediction_stats = NULL,
prediction_return_limit = getOption("craftgrn.module1_prediction_return_limit", 5e+06),
min_non_na = 3L,
cores = NULL,
verbose = TRUE,
time_log = verbose
)Arguments
- omics_data
CraftGRN multiomic object returned by `load_prep_multiomic_data()`.
- out_dir
Output directory.
- db
Motif database label used in output metadata.
- label_col
Metadata column used to build condition-level matrices when missing from `omics_data`.
- r_cutoff
Minimum positive correlation used for motif-supported and prediction calls.
- p_cutoff
Optional best-method p-value cutoff. If `NULL`, p-value filtering is disabled.
- fdr_cutoff
Optional best-method adjusted p-value cutoff. If `NULL`, FDR filtering is disabled.
- filter_to_canonical_bound
Logical; if `TRUE`, only footprints with at least one motif-supported TF passing the cutoffs are used for the all-expressed-TF prediction stage.
- tf_subset
Optional TF subset.
- write_outputs
Write Module 1 output files.
- write_stats
Retain and write full FP-TF correlation statistics.
- write_bed
Write optional BED-like browser files for high-confidence footprints and in-memory TFBS prediction statistics.
- write_qc_report
Write a Module 1 HTML QC report when outputs are written.
- qc_report_scan
Scan predicted TFBS chunks for top-TF summaries in the QC report.
- output_format
Output format for large streamed TFBS prediction statistic chunks.
- return_prediction_stats
Return the TFBS prediction statistic table in memory. If `NULL`, large output-writing runs are streamed to disk and return a manifest.
- prediction_return_limit
Maximum number of predicted events to keep in memory when `return_prediction_stats = NULL` and `write_outputs = TRUE`.
- min_non_na
Minimum finite condition pairs required for correlation.
- cores
Number of worker cores for the dense prediction correlation step. If `NULL`, use available cores.
- verbose
Emit concise progress messages.
- time_log
Logical; if TRUE, emit elapsed-time messages.