This function sanitizes dictionary terms to ensure they're valid for entity extraction.
Usage
sanitize_dictionary(
dictionary,
term_column = "term",
type_column = "type",
validate_types = TRUE,
verbose = TRUE
)
Arguments
- dictionary
A data frame containing dictionary terms.
- term_column
The name of the column containing the terms to sanitize.
- type_column
The name of the column containing entity types.
- validate_types
Logical. If TRUE, validates terms against their claimed type.
- verbose
Logical. If TRUE, prints information about the filtering process.