BERTScore evaluates the quality of generated text by comparing contextualized embeddings from models like BERT against a reference text.
Unlike n-gram metrics (e.g., BLEU), BERTScore focuses on semantic similarityand is often better at capturing whether meaning is preserved.
Recommended: microsoft/deberta-large-mnli for English
Language of the texts (ISO code).