Scrape book reviews from Goodreads — scrape

This function scrapes book reviews from Goodreads based on provided book IDs.

Usage

scrape_reviews(
  book_ids_path,
  num_reviews = 30,
  use_parallel = FALSE,
  num_cores = 4
)

Arguments

book_ids_path: A character string specifying the path to a file containing book IDs.
num_reviews: An integer specifying the number of reviews to scrape per book. Default is 30.
use_parallel: A logical value indicating whether to use parallel processing. Default is FALSE.
num_cores: An integer specifying the number of cores to use for parallel processing. Default is 4.

Value

A data frame containing scraped review information.

Examples

# \donttest{
# Create a temporary file with sample book IDs
temp_file <- tempfile(fileext = ".txt")
writeLines(c("1420", "2767052", "10210"), temp_file)
# Run the function (with a small number of reviews to keep the example quick)
reviews <- scrape_reviews(temp_file, num_reviews = 5, use_parallel = FALSE)
#> Total book IDs to process: 3
#> 2024-09-03 16:17:20.063063 scrape_goodreads_reviews: Completed! All book reviews extracted
#> Scraping run time = 8.58704137802124
#> Total books processed: 3
print(head(reviews))
#> # A tibble: 6 × 8
#>   book_id reviewer_id reviewer_name review_content            reviewer_followers
#>   <chr>   <chr>       <chr>         <chr>                                  <dbl>
#> 1 1420    91434473    daph pink ♡   "if you don't ship Hamra…                  3
#> 2 1420    83582       Bill Kerwin   "I don't have any earth-…                 NA
#> 3 1420    44531801    Nayra.Hassan  "متردد في قراءة هاملت..س…                  6
#> 4 1420    416390      Paul Bryant   "The Skinhead Hamlet - S…                 11
#> 5 1420    10171516    jessica       "shakespeare when pitchi…                 44
#> 6 2767052 3672777     Nataliya      "Suzanne Collins has bal…                 14
#> # ℹ 3 more variables: reviewer_total_reviews <dbl>, review_date <chr>,
#> #   review_rating <dbl>
# Clean up: remove the temporary file
file.remove(temp_file)
#> [1] TRUE
# }