Skip to contents

This function extracts entities from text and optionally assigns them to specific semantic categories based on dictionaries.

Usage

extract_entities(
  text_data,
  text_column = "abstract",
  dictionary = NULL,
  case_sensitive = FALSE,
  overlap_strategy = c("priority", "all", "longest"),
  sanitize_dict = TRUE
)

Arguments

text_data

A data frame containing article text data.

text_column

Name of the column containing text to process.

dictionary

Combined dictionary or list of dictionaries for entity extraction.

case_sensitive

Logical. If TRUE, matching is case-sensitive.

overlap_strategy

How to handle terms that match multiple dictionaries: "priority", "all", or "longest".

sanitize_dict

Logical. If TRUE, sanitizes the dictionary before extraction.

Value

A data frame with extracted entities, their types, and positions.