Vectorized preprocessing of text — vec_preprocess • LBDiscover

This function preprocesses text data using vectorized operations for better performance.

This function preprocesses text data using vectorized operations for better performance.

Usage

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

vec_preprocess(
  text_data,
  text_column = "abstract",
  remove_stopwords = TRUE,
  custom_stopwords = NULL,
  min_word_length = 3,
  max_word_length = 50,
  chunk_size = 100
)

Arguments

text_data: A data frame containing text data.
text_column: Name of the column containing text to process.
remove_stopwords: Logical. If TRUE, removes stopwords.
custom_stopwords: Character vector of additional stopwords to remove.
min_word_length: Minimum word length to keep.
max_word_length: Maximum word length to keep.
chunk_size: Number of documents to process in each chunk.

Value

A data frame with processed text.

A data frame with processed text.

Examples

if (FALSE) { # \dontrun{
processed_data <- vec_preprocess(article_data, text_column = "abstract")
} # }
if (FALSE) { # \dontrun{
processed_data <- vec_preprocess(article_data, text_column = "abstract")
} # }