There are 9 vowels and 36 diphthongs, 28 of which are native to Estonian. All nine vowels can appear as the first component of a diphthong, but only /ɑ e i o u/ occur as the second component. A vowel characteristic of Estonian is the unrounded back vowel /ɤ/, which may be close-mid back, close back, or close-mid central.
- /y, ø, ɤ, æ/ occur only in initial syllables. Other vowels (/i, u, e, o, ɑ/) can occur in both initial and non-initial syllables, but /o/ occurs in non-initial syllables only in proper names and loanwords.
- The front vowels /y, ø, æ/ are phonetically near-front [ÿ, ø̈, æ̈].
- Before and after /j/, the back vowels /u, ɑ/ can be fronted to [u̟, ɑ̽].
- The unrounded vowel transcribed /ɤ/ can be realized as close back [ɯ], close-mid central [ɘ] or close-mid back [ɤ], depending on the speaker.
- The mid vowels /e, ø, o/ are phonetically close-mid [e, ø, o].
- Word-final /e/ is often realized as mid [e̞].
- The open vowels [æ, ɑ] are phonetically near-open [æ̈, ɑ̝].
Simple vowels can be inherently short or long, written with single and double vowel letters respectively. Diphthongs are always inherently long. Furthermore, long vowels and diphthongs have two suprasegmental lengths. This is described further below.
- /m, p, pː/ are bilabial, whereas /f, v, fː/ are labiodental.
- The fricatives /f, ʃ/ appear only in loanwords. Some speakers /ʃ/ merge with /s/ to [s].
- /n/ is realized as velar [ŋ] before a velar consonant (e.g. panga /pɑnkɑ/ [pɑŋɡ̊ɑ] 'bank [gen.sg.]').
- /k, kː/ are velar, whereas /j/ is palatal.
- The stops are voiceless unaspirated, but the short versions can be partially [p̬, t̬, t̬ʲ, k̬] or fully [b, d, dʲ, ɡ] voiced when they appear between vowels.
- In spontaneous speech, word-initial /h/ is usually dropped. It is mostly retained in formal speech, and can be realized as voiced [ɦ] between two voiced sounds.
Like the vowels, most consonants can be inherently short or long. For the plosives, this distinction is reflected as a distinction in tenseness/voicing, with short plosives being voiced and long plosives being voiceless. This distinction only applies fully for single consonants after stressed syllables. In other environments, the length or tenseness/voicing distinctions may be neutralised:
- After unstressed syllables or in consonant clusters, only obstruents can be long, other consonants are always short.
- In consonant clusters, voiced plosives are devoiced when next to another obstruent. That is, voiced plosives only occur next to a vowel or a sonorant.
- Word-initially, obstruents are always voiceless, while the remaining consonants are always short. Recent loanwords may have voiced initial plosives, however.
In addition, long consonants and clusters also have two suprasegmental lengths, like the vowels. This is described below.
Non-phonemic palatalization generally occurs before front vowels. In addition, about 0.15% of the vocabulary features fully phonemic palatalization, where palatalization occurs without the front vowel. A front vowel did historically occur there, but was lost, leaving the palatalization as its only trace (a form of cheshirization). It mostly occurs word-finally, but in some cases it may also occur word-medially. Thus, palatalization does not necessarily need a front vowel, and palatalized vs. plain continuants can be articulated. Palatalization is not indicated in the standard orthography.
The stress in Estonian is usually on the first syllable, as was the case in Proto-Finnic. There are a few exceptions with the stress on the second syllable: aitäh ('thanks'), sõbranna ('female friend'). In loanwords, the original stress can be borrowed as well: ideaal ('ideal'), professor ('professor'). The stress is weak, and as length levels already control an aspect of "articulation intensity", most words appear evenly stressed.
Syllables can be divided into short and long. Syllables ending in a short vowel are short, while syllables ending in a long vowel, diphthong or consonant are long. The length of vowels, consonants and thus syllables is "inherent" in the sense that it is tied to a particular word and is not subject to morphological alternations.
All stressed long syllables can possess a suprasegmental length feature. When a syllable has this feature, any long vowel or diphthong in the syllable is lengthened further, as is any long consonant or consonant cluster at the end of that syllable. A long syllable without suprasegmental length is termed "long", "half-long", "light" or "length II" and is denoted in IPA as ⟨ˑ⟩ or ⟨ː⟩. A long syllable with suprasegmental length is termed "overlong", "long", "heavy" or "length III", denoted in IPA as ⟨ː⟩ or ⟨ːː⟩. For consistency, this article employs the terms "half-long" and "overlong" and uses ⟨ː⟩ and ⟨ːː⟩, respectively, to denote them.
Both the regular short-long distinction and the suprasegmental length are distinctive, so that Estonian effectively has three distinctive vowel and consonant lengths, the distinction between the second and third length levels being at a level larger than the phoneme, such as the syllable or the foot. In addition to realizing greater phonetic duration, overlength in modern Estonian involves a pitch distinction where falling pitch is realized in syllables that are overlong and level pitch is realized in syllables that are short or long.
The suprasegmental length is not indicated in the standard orthography except for the plosives for which a single voiceless letter represents a half-long consonant while a double voiceless letter represents an overlong consonant. There are many minimal pairs and also some minimal triplets which differ only by length:
- vere /vere/ 'blood [gen.sg.]' (short) — veere /veːre/ 'edge [gen. sg.]' (long) — veere /veːːre/ 'edge [ptv. plural] ' but also 'roll [imp. 2nd sg.] ' (overlong)
- lina /linɑ/ 'sheet' (short) — linna /linːɑ/ 'town [gen. sg.]' (long) — linna /linːːɑ/ 'town [ine. sg.]' (overlong)
- kabi /kɑpi/ 'hoof' (short) — kapi /kɑpːi/ 'wardrobe [gen. sg.]' (long) — kappi /kɑpːːi/ 'wardrobe [ine. sg.]' (overlong)
The extra length distinction has a number of origins:
- Single-syllable words are always overlong, if they have a long syllable.
- Overlong syllables appear in strong-grade environments, while half-long syllables appear in weak-grade environments. This is traceable to an earlier (Proto-Finnic) distinction between open and closed syllables: closed syllables shortened and weakened a preceding syllable.
- Syncopation of a medial syllable lengthens the preceding syllable.
- When a consonant disappears altogether in the weak grade, coalescence of the two adjacent vowels produces an overlong syllable.
- Compensatory lengthening in the short illative singular form of nominals produces an overlong syllable, even from an originally short syllable.
- Asu, Eva Liina; Teras, Pire (2009), "Estonian" (PDF), Journal of the International Phonetic Association, 39 (3): 367–372, doi:10.1017/s002510030999017x
- Lippus, Pärtel; Ross, Jaan (2011), "Has Estonian quantity system changed in a century?" (PDF), ICPhS, 17: 1262–1265
- Prince, Alan (1980), "A metrical theory of Estonian quantity", Linguistic Inquiry, 11 (3): 511–562
- Ross, Jaan; Lehiste, Ilse (2001), The temporal structure of Estonian runic songs, The Hague: Walter de Gruyter