Learn Before
  • n-grams

Perplexity

Perplexity is a probability-based metric for evaluating language models. It is the weighted average of the number of possible next words that can follow any word, a.k.a. the weighted average branching factor.

Given a mini-language of 10 words "zero, one ... ten", each word's occurrence probability is 1/10 (unigram), the perplexity is the inverse is 10:

PP(W)=P(w1w2wN)1N=(110)1N=110=10\begin{aligned} \mathrm{PP}(W) &=P\left(w_{1} w_{2} \ldots w_{N}\right)^{-\frac{1}{N}} \\ &=\left(\frac{1}{10}\right)^{-\frac{1}{N}} \\ &=\frac{1}{10} \\ &=10 \end{aligned}

0

1

4 years ago

Tags

Natural language processing

Data Science

Related
  • Markov used in NLP

  • MLE & Normalizing

  • Perplexity

  • General equation of n-gram model