Machine translation models are usually evaluated using bleu score. I want to get some intuition for this score. What is the bleu score of professional human translator?
I know it depends on the languages, the translator ect. I just want to get the scale.
edit: I want to make it clear - I talk about the expected bleu. It's not a theoretical question, it is an experimental one.