🔍 BERTScore Demo

BERTScore evaluates the quality of generated text by comparing contextualized embeddings from models like BERT against a reference text.

Unlike n-gram metrics (e.g., BLEU), BERTScore focuses on semantic similarity
and is often better at capturing whether meaning is preserved.

Enter a reference text (ground truth).
Enter a candidate text (model output or paraphrase).
Click Compute BERTScore.

Reference Text (Ground Truth)

Candidate Text (Generated/Paraphrased)

Embedding Model

Recommended: microsoft/deberta-large-mnli for English

Language Code

Language of the texts (ISO code).

Rescale with Baseline (recommended for comparing scores)

Precision

Recall

F1 (Main BERTScore Metric)