Construct input documents for topic modeling
Source:R/utils_step3_topic_benchmark.R
module3_construct_docs.RdBuilds and caches the document-level link table, document-term table, sparse document-term matrix, and summary metadata used by Module 3 topic modeling.
Usage
module3_construct_docs(
filtered_dir,
output_dir,
tf_cluster_map = NULL,
check_repeated_values = FALSE,
...
)Arguments
- filtered_dir
Directory containing Module 3 filtered differential-link CSV files.
- output_dir
Directory where topic input caches are written.
- tf_cluster_map
Named vector mapping TF names to motif clusters.
- check_repeated_values
Warn about repeated inconsistent term values. The high-throughput default is `FALSE`; set to `TRUE` for diagnostic audits.
- ...
Additional topic-document construction arguments passed to the internal Module 3 document builder.