Enhanced sanitize dictionary function — sanitize_dictionary • LBDiscover

This function sanitizes dictionary terms to ensure they're valid for entity extraction.

Usage

sanitize_dictionary(
  dictionary,
  term_column = "term",
  type_column = "type",
  validate_types = TRUE,
  verbose = TRUE
)

Arguments

dictionary: A data frame containing dictionary terms.
term_column: The name of the column containing the terms to sanitize.
type_column: The name of the column containing entity types.
validate_types: Logical. If TRUE, validates terms against their claimed type.
verbose: Logical. If TRUE, prints information about the filtering process.

Value

A data frame with sanitized terms.