Skip to contents

This function clusters documents using K-means based on their TF-IDF vectors.

Usage

cluster_docs(
  text_data,
  text_column = "abstract",
  n_clusters = 5,
  min_term_freq = 2,
  max_doc_freq = 0.9,
  random_seed = 42
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to analyze.

n_clusters

Number of clusters to create.

min_term_freq

Minimum frequency for a term to be included.

max_doc_freq

Maximum document frequency (as a proportion) for a term to be included.

random_seed

Seed for random number generation (for reproducibility).

Value

A data frame with the original data and cluster assignments.

Examples

if (FALSE) { # \dontrun{
clustered_data <- cluster_docs(article_data, text_column = "abstract", n_clusters = 5)
} # }