Skip to contents

This function calculates the similarity between documents using TF-IDF weighting and cosine similarity.

Usage

calc_doc_sim(
  text_data,
  text_column = "abstract",
  min_term_freq = 2,
  max_doc_freq = 0.9
)

Arguments

text_data

A data frame containing text data.

text_column

Name of the column containing text to analyze.

min_term_freq

Minimum frequency for a term to be included.

max_doc_freq

Maximum document frequency (as a proportion) for a term to be included.

Value

A similarity matrix for the documents.

Examples

if (FALSE) { # \dontrun{
sim_matrix <- calc_doc_sim(article_data, text_column = "abstract")
} # }