Pronunciation Consonant phonemes
English does not have more individual consonant sounds than most languages. However, the interdentals, /θ/ and /ð/ (the sounds written with th), which are common in English (thin, thing, etc.; and the, this, that, etc.) are relatively rare in other languages, even others in the Germanic family (e.g., English thousand = German tausend), and these sounds are missing even in some English dialects. Some learners substitute a [t] or [d] sound, while others shift to [s] or [z], [f] or [v] and even [ts] or [dz].
Speakers of Japanese, Korean, Chinese and Thai may have difficulty distinguishing [ɹ] and [l]. Speakers of Xiang Chinese may have a similar difficulty distinguishing [n] and [l]. The distinction between [b] and [v] can cause difficulty for native speakers of Spanish, Arabic, Japanese and Korean.
Vowel phonemes
Vowel phonemes
The precise number of distinct vowel sounds depends on the variety of English: for example, Received Pronunciation has twelve monophthongs (single or "pure" vowels), eight diphthongs (double vowels) and two triphthongs (triple vowels); whereas General American has thirteen monophthongs and three diphthongs.[citation needed][dubious – discuss] Many learners, such as speakers of Spanish, Japanese or Arabic, have fewer vowels, or only pure ones, in their mother tongue and so may have problems both with hearing and with pronouncing these distinctions.
Syllable structure
Syllable structure
In its syllable structure, English allows for a cluster of up to three consonants before the vowel and four after it (e.g., straw, desks, glimpsed). The syllable structure causes problems for speakers of many other languages. Japanese, for example, broadly alternates consonant and vowel sounds so learners from Japan often try to force vowels in between the consonants (e.g., desks /desks/ becomes "desukusu" or milk shake /mɪlk ʃeɪk/ becomes "mirukushēku").
Learners from languages where all words end in vowels sometimes tend to make all English words end in vowels, thus make /meɪk/ can come out as [meɪkə]. The learner's task is further complicated by the fact that native speakers may drop consonants in the more complex blends (e.g., [mʌns] instead of [mʌnθs] for months).
Unstressed vowels - Native English speakers frequently replace almost any vowel in an unstressed syllable with an unstressed vowel, often schwa. For example, from has a distinctly pronounced short 'o' sound when it is stressed (e.g., Where are you from?), but when it is unstressed, the short 'o' reduces to a schwa (e.g., I'm from London.). In some cases, unstressed vowels may disappear altogether, in words such as chocolate (which has four syllables in Spanish, but only two as pronounced by Americans: "choc-lit".)
Stress in English more strongly determines vowel quality than it does in most other world languages (although there are notable exceptions such as Russian). For example, in some varieties the syllables an, en, in, on and un are pronounced as homophones, that is, exactly alike. Native speakers can usually distinguish an able, enable, and unable because of their position in a sentence, but this is more difficult for inexperienced English speakers. Moreover, learners tend to overpronounce these unstressed vowels, giving their speech an unnatural rhythm.
Stress timing - English tends to be a stress-timed language - this means that stressed syllables are roughly equidistant in time, no matter how many syllables come in between. Although some other languages, e.g., German and Russian, are also stress-timed, most of the world's other major languages are syllable-timed, with each syllable coming at an equal time after the previous one. Learners from these languages often have a staccato rhythm when speaking English that is disconcerting to a native speaker.
"Stress for emphasis" - students' own languages may not use stress for emphasis as English does.
"Stress for contrast" - stressing the right word or expression. This may not come easily to some non-native speakers."Emphatic apologies" - the normally unstressed auxiliary is stressed (I really am very sorry)
In English there are quite a number of words - about fifty - that have two different pronunciations, depending on whether they are stressed. They are "grammatical words": pronouns, prepositions, auxiliary verbs and conjunctions. Most students tend to overuse the strong form, which is pronounced with the written vowel.
Connected speech
Phonological processes such as assimilation, elision and epenthesis together with indistinct word boundaries can confuse learners when listening to natural spoken English, as well as making their speech sound too formal if they do not use them.
Grammar
Tense, aspect, and mood - English has a relatively large number of tense-aspect-mood forms with some quite subtle differences, such as the difference between the simple past "I ate" and the present perfect "I have eaten." Progressive and perfect progressive forms add complexity. (See English verbs.)
Functions of auxiliaries - Learners of English tend to find it difficult to manipulate the various ways in which English uses auxiliary verbs. These include negation (e.g. He hasn't been drinking.), inversion with the subject to form a question (e.g. Has he been drinking?), short answers (e.g. Yes, he has.) and tag questions (has he?). A further complication is that the dummy auxiliary verb do /does /did is added to fulfil these functions in the simple present and simple past, but not for the verb to be.
Modal verbs - English also has a significant number of modal auxiliary verbs which each have a number of uses. For example, the opposite of "You must be here at 8" (obligation) is usually "You don't have to be here at 8" (lack of obligation, choice), while "must" in "You must not drink the water" (prohibition) has a different meaning from "must" in "You must not be a native speaker" (deduction). This complexity takes considerable work for most English language learners to master.
Idiomatic usage - English is reputed to have a relatively high degree of idiomatic usage. For example, the use of different main verb forms in such apparently parallel constructions as "try to learn", "help learn", and "avoid learning" pose difficulty for learners. Another example is the idiomatic distinction between "make" and "do": "make a mistake", not "do a mistake"; and "do a favor", not "make a favor".
Articles - English has an appreciable number of articles, including the "the" definite article and the "a, an" indefinite article. At times English nouns can or indeed must be used without an article; this is called the zero article. Some of the differences between definite, indefinite and zero article are fairly easy to learn, but others are not, particularly since a learner's native language may lack articles or use them in different ways than English does. Although the information conveyed by articles is rarely essential for communication, English uses them frequently (several times in the average sentence), so that they require some effort from the learner.
Vocabulary Phrasal verbs - Phrasal verbs in English can cause difficulties for many learners because they have several meanings and different syntactic patterns. There are also a number of phrasal verb differences between American and British English.
Word derivation - Word derivation in English requires a lot of rote learning. For example, an adjective can be negated by using the prefix un- (e.g. unable), in- (e.g. inappropriate), dis- (e.g. dishonest), or a- (e.g. amoral), or through the use of one of a myriad related but rarer prefixes, all modified versions of the first four.
Size of lexicon - The history of English has resulted in a very large vocabulary, essentially one stream from Old English and one from the Norman infusion of Latin-derived terms. (Schmitt & Marsden claim that English has one of the largest vocabularies of any known language.) This inevitably requires more work for a learner to master the language.
Collocations - Collocations in English refer to the tendency for words to occur regularly with others. For example, nouns and verbs that go together (ride a bike/ drive a car). Native speakers tend to use chunks of collocations and the ESL learners make mistakes with collocations in their writing/speaking which sometimes results in awkwardness.
Slang and Colloquialisms - In most native English speaking countries, large numbers of slang and colloquial terms are used in everyday speech. Many learners may find that classroom based English is significantly different to how English is spoken in normal situations. This can often be difficult and confusing for learners with little experience of using English in Anglophone countries. Also, slang terms differ greatly between different regions and can change quickly in response to popular culture. Some phrases can become unintentionally rude if misused.