🔍 BERTScore Demo

BERTScore evaluates the quality of generated text by comparing contextualized embeddings from models like BERT against a reference text.

Unlike n-gram metrics (e.g., BLEU), BERTScore focuses on semantic similarity
and is often better at capturing whether meaning is preserved.

  1. Enter a reference text (ground truth).
  2. Enter a candidate text (model output or paraphrase).
  3. Click Compute BERTScore.
Embedding Model

Recommended: microsoft/deberta-large-mnli for English

Language Code

Language of the texts (ISO code).