This function scrapes book reviews from Goodreads based on provided book IDs.
Arguments
- book_ids_path
A character string specifying the path to a file containing book IDs.
- num_reviews
An integer specifying the number of reviews to scrape per book. Default is 30.
- use_parallel
A logical value indicating whether to use parallel processing. Default is FALSE.
- num_cores
An integer specifying the number of cores to use for parallel processing. Default is 4.
Examples
# \donttest{
# Create a temporary file with sample book IDs
temp_file <- tempfile(fileext = ".txt")
writeLines(c("1420", "2767052", "10210"), temp_file)
# Run the function (with a small number of reviews to keep the example quick)
reviews <- scrape_reviews(temp_file, num_reviews = 5, use_parallel = FALSE)
#> Total book IDs to process: 3
#> 2024-10-25 03:02:43.758913 scrape_goodreads_reviews: Completed! All book reviews extracted
#> Scraping run time = 8.24790334701538
#> Total books processed: 3
print(head(reviews))
#> # A tibble: 6 × 8
#> book_id reviewer_id reviewer_name review_content reviewer_followers
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 1420 91434473 daph pink ♡ "if you don't ship Hamra… 3
#> 2 1420 83582 Bill Kerwin "I don't have any earth-… NA
#> 3 1420 44531801 Nayra.Hassan "متردد في قراءة هاملت..س… 6
#> 4 1420 10171516 jessica "shakespeare when pitchi… 45
#> 5 1420 416390 Paul Bryant "The Skinhead Hamlet - S… 11
#> 6 2767052 3672777 Nataliya "Suzanne Collins has bal… 14
#> # ℹ 3 more variables: reviewer_total_reviews <dbl>, review_date <chr>,
#> # review_rating <dbl>
# Clean up: remove the temporary file
file.remove(temp_file)
#> [1] TRUE
# }