Skip to contents

This function calculates the semantic similarity between two conversations using either TF-IDF, Word2Vec, or GloVe embeddings approach.

Usage

semantic_similarity(
  conversation1,
  conversation2,
  method = "tfidf",
  model_path = NULL,
  dim = 100,
  window = 5,
  iter = 5
)

Arguments

conversation1

A character string representing the first conversation

conversation2

A character string representing the second conversation

method

A character string specifying the method to use: "tfidf", "word2vec", or "glove"

model_path

A character string specifying the path to pre-trained GloVe file (required for "glove" method)

dim

An integer specifying the dimensionality for Word2Vec embeddings (default: 100)

window

An integer specifying the window size for Word2Vec (default: 5)

iter

An integer specifying the number of iterations for Word2Vec (default: 5)

Value

A numeric value representing the semantic similarity (between 0 and 1)

Examples

conv1 <- "The quick brown fox jumps over the lazy dog"
conv2 <- "A fast auburn canine leaps above an idle hound"
semantic_similarity(conv1, conv2, method = "tfidf")
#> Warning: The 'tfidf' method may not provide highly meaningful results for short conversations or those with little vocabulary overlap. Consider using 'word2vec' or 'glove' methods for more robust results.
#> [1] 0.5