
Extract and classify entities from text with multi-domain types
Source:R/text_preprocessing.R
extract_entities.Rd
This function extracts entities from text and optionally assigns them to specific semantic categories based on dictionaries.
Usage
extract_entities(
text_data,
text_column = "abstract",
dictionary = NULL,
case_sensitive = FALSE,
overlap_strategy = c("priority", "all", "longest"),
sanitize_dict = TRUE
)
Arguments
- text_data
A data frame containing article text data.
- text_column
Name of the column containing text to process.
- dictionary
Combined dictionary or list of dictionaries for entity extraction.
- case_sensitive
Logical. If TRUE, matching is case-sensitive.
- overlap_strategy
How to handle terms that match multiple dictionaries: "priority", "all", or "longest".
- sanitize_dict
Logical. If TRUE, sanitizes the dictionary before extraction.