Back on September 15, 2010 I wrote a little piece, “Frequency – Scrabble and the actual frequency of letter usage in English”, about what seemed to me to be a puzzling mis-match between the number of tiles in Scrabble and my superficial sense of the frequency of occurrence of letters in English. While I was away in Hong Kong in December, I received the following email note from David T. Wong,
Excuse me for being very late to comment about the piece you wrote last year regarding the frequency of letters used in the English language. The article was posted on September 15, 2010, but I have only recently just read it. Anyway, in the article you had given the assumption that the frequency of letters used in English should be based upon their frequency as occurring in the Concise Oxford Dictionary. I have no objection to using that dictionary for the frequency of letters used in English words, but to assume that the frequency occurring in English words is the same as the frequency occurring is English usage is an absolutely untrue assumption. This is because not all words are used equally.
The simplest way to explain this is that while there is only one word “quiz” in the dictionary, and only one word “the” in the dictionary, why would you expect that the two words (and their component letters) to be used with equal frequency? My point is that you will hear (or see) the word “the” (and its component letters) hundreds of times more frequently than you will see or hear the word “quiz”, yet you only counted each letter (T, H, E, Q, U, I, Z) only once in your analysis of the dictionary. Most analyses of English usage that I see reveal a rather different ranking for letter frequency from the one you showed in your posting of 15 September 2010.
Thank you for your time in considering my opinion on the matter.
In thinking again about this question it struck me that in the case of Scrabble, though a word may be appear more than once on the board, there is no real connection between the words on the board and words used during conversation and in writing. Scrabble players select words to place on the board based on the letters available to them and the maximization of the score based on the scoring rules of the game. One could play Scrabble with just the lexicon of chemistry and find the game works quite well, though with a much smaller universe of likely players.
Thus, the issue for the frequency of the use of the letters in the lexicon of English is the appropriate question, not the frequency of the letters as found in a real stream of communication or “usage”.
This frequency topic reminded me of how peculiar frequency can be in human language. Consider the English phoneme “th” (e.g. in the word “then”) in words in a lexicon of over 70,646 in an advanced learner’s dictionary and the frequency of use in the spoken text. The phoneme “th” occurs in only 0.12% of the words, yet comprises 3.56% of the phonemes in spoken usage. It ranks 43rd out of 44 phonemes in English as occurrences in the lexicon but 6th in spoken usage.
Below is a chart for further exploration.
RP phonemes in the Advanced Learner’s Dictionary(adapted from: http://myweb.tiscali.co.uk/wordscape/wordlist/phonfreq.html) |
|||||
phoneme | illustrative keyword | total occurrences in lexicon | total words | % of total occurrences in lexicon | % occurrences in speech |
ə | another | 31009 | 26813 | 6.29% | 10.74% |
ɪ | bid | 51830 | 37729 | 10.52% | 8.33% |
n | near | 31934 | 27020 | 6.48% | 7.58% |
t | teat | 34260 | 29441 | 6.95% | 6.42% |
d | died | 21275 | 19125 | 4.32% | 5.14% |
s | see | 33922 | 28548 | 6.88% | 4.81% |
l | low | 27373 | 25435 | 5.56% | 3.66% |
ð | then | 596 | 593 | 0.12% | 3.56% |
r | raw | 23069 | 21434 | 4.68% | 3.51% |
m | my | 14823 | 13988 | 3.01% | 3.22% |
k | cake | 22453 | 20308 | 4.56% | 3.09% |
e | bed | 11312 | 10940 | 2.30% | 2.97% |
w | west | 4600 | 4523 | 0.93% | 2.81% |
z | zoo | 19972 | 18808 | 4.05% | 2.46% |
v | vine | 6007 | 5859 | 1.22% | 2.00% |
b | bib | 10907 | 10420 | 2.21% | 1.97% |
aɪ | bite | 7441 | 7236 | 1.51% | 1.83% |
f | fine | 8839 | 8606 | 1.79% | 1.79% |
p | pop | 15553 | 14569 | 3.16% | 1.78% |
ʌ | bud | 7124 | 6917 | 1.45% | 1.75% |
eɪ | bait | 10234 | 10029 | 2.08% | 1.71% |
i | bead | 6721 | 6525 | 1.36% | 1.65% |
əʊ | no | 6685 | 6416 | 1.36% | 1.51% |
h | high | 3699 | 3625 | 0.75% | 1.46% |
æ | bad | 11603 | 11149 | 2.35% | 1.45% |
ɒ | pot | 7960 | 7747 | 1.62% | 1.37% |
ɔ | port | 4730 | 4627 | 0.96% | 1.24% |
ŋ | sing | 9181 | 8958 | 1.86% | 1.15% |
u | boot | 4794 | 4743 | 0.97% | 1.13% |
g | go | 6239 | 6079 | 1.27% | 1.05% |
ʃ | shy | 6117 | 6039 | 1.24% | 0.96% |
j | year | 3560 | 3518 | 0.72% | 0.88% |
ʊ | put | 1977 | 1959 | 0.40% | 0.86% |
ɑ | bard | 4215 | 4141 | 0.86% | 0.79% |
aʊ | cow | 2179 | 2135 | 0.44% | 0.61% |
ʤ | judge | 3869 | 3802 | 0.79% | 0.60% |
ɜ | bird | 3095 | 3083 | 0.63% | 0.52% |
ʧ | chin | 2672 | 2639 | 0.54% | 0.41% |
Ɵ | think | 1602 | 1591 | 0.33% | 0.37% |
eə | bear | 965 | 962 | 0.20% | 0.34% |
ɪə | beer | 4174 | 4034 | 0.85% | 0.21% |
oɪ | boy | 788 | 784 | 0.16% | 0.14% |
ʒ | treasure | 334 | 334 | 0.07% | 0.10% |
ʊə | poor | 1053 | 1053 | 0.21% | 0.06% |