This function clusters documents using K-means based on their TF-IDF vectors.
Usage
cluster_docs(
text_data,
text_column = "abstract",
n_clusters = 5,
min_term_freq = 2,
max_doc_freq = 0.9,
random_seed = 42
)
Arguments
- text_data
A data frame containing text data.
- text_column
Name of the column containing text to analyze.
- n_clusters
Number of clusters to create.
- min_term_freq
Minimum frequency for a term to be included.
- max_doc_freq
Maximum document frequency (as a proportion) for a term to be included.
- random_seed
Seed for random number generation (for reproducibility).