A site for spellers, teachers of spelling and reading, and students of english words
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader

Introduction to the CommonWords Database

Printable format 142k

The CommonWords database consists of three data tables: (i) CommonWords, a list of 6200+ high frequency words, (ii) SSCorrespondences, a list of the 300+ sound-to-spelling correspondences found in those 6200+ words, (iii) Themes, a list of 72 themes, or topics, for which in the Commonwords Table words have been tagged. Detailed descriptions of the fields in these three tables follow:

CommonWords Table

The CommonWords table contains the following fields, which can be used to filter to specialized word lists:

Word. The Word field can be used to filter to words with various letter strings – for instance, “Word contains sh” returns all words with the consonant digraph <sh> anywhere in the word, while “Word ends with sh” returns only those with final <sh>.

XP contains the same explications as given in the Words table of the larger Lexis database. For more information on elements given in XP, you should consult the appropriate table in the Lexis database. This XP field can be used to filter to words with various prefixes, bases, suffixes, and procedures. The following are some possible search strings:

To find words that contain the prefix de– : "[de+"
To find words that contain the base fect: "fect"
To find words that contain the suffix –ing : “+ing]”
To find words that contain twinning: "[!aeiouwxy]+[!i]+[aeiouy]"
To find words that contain final <e> deletion: “/e+[eiouy]”
To find words that contain prefixes with assimilation: "/[!aeiouwy]+[!aeiouwy]+"

Correspondences. The Correspondences field gives all sound-to-spelling correspondences found in each word, in order. Due to font limitations the following substitutions represent sounds otherwise represented by non-ASCII characters:

[a1] = [a] as in fat
[a2] = [a] as in fate
[a3r] = [âr] as in fair
[e1] = [e] as in pet
[e2] = [e] as in peat
[e3r] = [ê] as in pier
[o1] = [] as in toss
[o2] = [o] as in toast
[o3r] = [ôr] as in torn
[o4] = [ä] as in Tom
[u1] = [u] as in but
[u2] = [u] as in boot
[u3] = [] as in book
[u4] = [] (schwa) as in amid
[u5] = [0] (varies from a full schwa to silent) as with the <a> in total
[l1] = [l] as in lull
[l2] = syllabic [l] as in battle
[n1] = [n] as in nun
[n2] = syllabic [n] as in forgotten
[n3] = [] (eng) as in sing
[th1] = voiceless <th> as in thin
[th2] = voiced <th> as in this

Square brackets enclose sounds, arrowhead brackets enclose letters, and the equal sign translates to “is spelled.” For instance, at meadow the Correspondences field has the following: “[m]=<m> [e1]=<ea> [d]=<d> [o2] =<ow>”. So if you are dealing with phonics, you can filter to certain correspondences or to certain sounds or to certain spellings. Thus, you can filter to all the words in which short <e> is spelled <ea> (there are currently 66 in CommonWords), or to all words that contain short <e>, [e1], however it's spelled (currently 933), or to the <ea> spelling (currently 220) – sometimes spelling short <e>, sometimes long <e> (as in streak), sometimes long <a> (as in steak), and sometimes schwa (as in ocean ).

Analysis. Analysis contains a number of orthographically significant features of each word, each of which can be filtered to:

CV# = Consonant + stressed long vowel at end of the word, as in by
CVC# = Consonant + stressed short vowel + consonant at end of the word, as in bat
V.V = Long vowel+vowel with syllable boundary between them, as in lion
VCC = Stressed short vowel+consonant+consonant, as in lettuce
VCCle = Stressed short vowel+consonant+consonant+<le>, as in little
VCle = Stressed long vowel+consonant+<le>, as in bugle
VCr = Stressed long vowel+consonant+<r>, as in secret
VCV = Stressed long vowel+consonant+vowel (including silent final <e>), as in vapor and rate
VCVX = VCV holdout, as done
VCCX = VCC holdout, as in blind
VCrX = VCr holdout, as in fabric
cd = Contains a consonant digraph
ct = Contains a consonant trigraph
vd = Contains a vowel digraph
vt = Contains a vowel trigraph
CMP = Compound word
TR = Instance of twinning or has derived forms with twinning
DELE = Instance of silent final <e> deletion
FLR = Instance of French Lemon Rule
3VR = Instance of 3rd Vowel Rule
SWR = Instance of Short Word Rule
y>i = Instance of <y> to <i> change, as in tries
i>y = Instance of <i> to <y> change, as in lying
! = Something is unusual in the word's form.

POS. This part-of-speech field indicates the parts of speech that a word can fill. It uses the following codes:

rj = Regular adjectives – that is, those that can take the inflectional suffixes -er, -est.
nj = Nonregular adjectives, those that show comparative and superlative periphrastically not with inflectional suffixes, if they can be compared at all – for instance, admirable.
ij = Irregular adjectives, those few that have different base forms for comparative and superlative, as in good, better, best.
rb = Regular adverbs.
nb = Nonregular adverbs.
rs = Regular substantives—that is, nouns that form plurals with -s or -es – like cat/cats or kiss/kisses.
ns = Nonregular substantives – like goose/geese.
rv = Regular verbs—that is, form past tense with -ed.
nv = Nonregular verbs.
c = Conjunctions.
p = Pronouns.
e = Prepositions.
a = Articles
tl = Past participles
tn = Present participles
in = Interjections

With several words there is not a perfect match between the phonetic analysis in the Correspondences field and the parts of speech in the POS field. For instance, in the Correspondences field the word alternate is analyzed phonetically with a long <a> in the final syllable, which is its pronunciation as a verb. But when used as a noun or adjective, that vowel is destressed to a short <i>. Nevertheless, in the POS field alternate is tagged as verb, and noun, and adjective. One way of thinking about it is that in the Correspondences field we have to settle on one pronunciation, but in the POS field we can take an inclusive view, including heterophonic senses of the word.

Prefixes. Prefixes are listed in two places: (i) the XP field lists any prefixes contained within the listed word, shown with a leading left square bracket; (ii) the Prefixes field lists prefixes that can be added to the listed word. In some cases the word can take a certain prefix only after it has taken certain other affixes. For instance, the word avoid can take the negative prefix un1- only after it has taken the suffix -able, since we do not have the word *unavoid, but we do have unavoidable. A few of the prefixes are numbered to discriminate homographs. To find which prefix each number refers to, see the Prefixes table in the Lexis database.

Suffixes. Suffixes are also listed in two places: (i) the XP field lists any suffixes contained within the listed word, with a following right square bracket; (ii) the Suffixes field lists suffixes that can be added to the listed word, with those in parentheses being suffixes that can be added only after the immediately preceding suffix has been added. For instance, the word flesh can take the comparative suffix -er]02 “more” only after it has taken the adjective suffix  -y]1: fleshier but not *flesher *”more flesh”. The ends of strings of embedded suffixes are marked with a double right parenthesis. Regular nouns, verbs, adverbs, and adjectives can also add the normal inflectional suffixes, though they are not all listed in the Suffixes column. In some cases different suffixes can be affixed to two different senses of a homographic stem. For instance, at the word camp the suffixes -aign and -er01 suffixes can only be added to camp1 with the sense “Field, temporary dwelling” while -y1 can only be added to camp2 with the sense “Humorous banality”. Many of the suffixes are numbered. To find which suffix each number refers to, see the Suffixes table in the Lexis database.

Rank. This field is meant to help some in deciding when to introduce certain words to students. It is based on the Thorndike-Lorge Teacher’s Word Book of 30,000 Words (New York: Teachers College Press, 1944, 1972), which suggests appropriate grade levels. “A” would include “AA”; “B” would include “A” and “AA”, etc. Obviously this Rank and the following Iowa ranking are both quite approximate:

AA = Appropriate for grades 1-2.
A = Appropriate for grade 3
B = Appropriate for grade 4 (A T-L score of 49-20)
C = Appropriate for grades 5-6 (A T-L score of 19-10)
D = Appropriate for grades 7-8 (A T-L score of 9-6)

Words with a T-L score of less than 6 are tagged with that score but not assigned to a grade level.

Iowa. Suggests the level of difficulty for some of the words in the database, based on the percentages of fourth graders who spelled the given word correctly in The New Iowa Spelling Scale (Iowa City: State University of Iowa, nd). A suggested categorization would be:

0-19 = Very hard
20-39 = Hard
40-59 = Medium
60-79 = Easy
80- = Very Easy

Characters. The number of characters (letters, punctuation, and blank spaces) in each word.

Syllables. The number of syllables in each word.

Themes. In this field over 4800 words are tagged for the various themes, or topics, to which they can be associated. It is intended to be useful for generating word lists dealing with a common theme, such as “colors” or “sports”. There is nothing very authoritative or exhaustive about these groupings. Subjective judgments abound and occasional violence is done to some formal, scientific categories. All I can say is that on at least one day one retired English teacher saw each word belonging to the various themes for which it was tagged. Due to the widespread homography in English, as a given form moves from one theme to another it often becomes a different word. For instance, the form <lime> in the Fruits theme is a homograph of the form <lime> in the Materials theme – that is, an entirely different word with the same spelling.

The full list of all 71 themes can be found in the Themes datasheet. In that listing a right parenthesis is used to divide a group name from the tag. Thus, in “Art) Music” the group is Art; the word to the right of the parenthesis is the tag name used in the Themes field of CommonWords. The following comments may be in order. The tag names are in bold face:

The Countries theme includes countries, continents and nationalities. The Location theme includes both locations and directions. The Occupation theme includes occupations and titles. The Health theme includes words dealing with health, sickness, and death.

Several themes are parts of larger groups:

The Animals group consists of four themes: Birds, Insects, Mammals and the more miscellaneous Animals, for everything else of or about an animal nature. Similarly, the Art group consists of four themes: Literature, Music, Visual arts, and plain Art for words that cut across all three types or are difficult to assign to any of the three specific arts. The Feelings group is divided into three themes: feelings or emotions that can in some sense be said to be Negative, those that can be said to be Positive, with more ambiguous or neutral emotions tagged simply Feeling.

The Food group is divided into seven themes: Drink, Fruit (including nuts), Grains (including bread), Meat (including dairy products, fish, and poultry), Sweets, Vegetables, and more generally, Food. The Government group is divided into two themes: government People and more general Government. The Math group includes Numbers and more general mathematical terms, tagged Math. The Measure group includes Amount (including sizes), Calculation consisting of calculated measurements, Units of measurement, Value consisting of measurements to which we ascribe subjective values, and Measure. The Military group consists of military Paraphernalia and equipment, military Personnel, and Military. The Science group consists of Biology, Chemistry, Geography (features and places), and Science. The Sports group consists of sports Equipment, sports Persons, and other things dealing with Sports.

Range and Subrange. The Range field indicates into which of five ranges each word falls. Ranges are intended to provide help in finding words appropriate to the students’ level of mastery. For instance, the 1,000 plus words in Range 1 are all completely regular and completely analyzable if the students have had work with the Range 1 sound-to-spelling correspondences, which are listed below. The ranges are organized so that each of the first four ranges contains only one spelling for each sound and only one sound for each spelling. This regularity is not true of the correspondences in Range 5, due to the existence of several sounds that have more than five different spellings.

Subranges 1a, 1b, 2a, and 2b are subsets of ranges 1 and 2. Subrange 1a consists of Range 1 words that contain only the consonant and short vowel correspondences from Range 1. It contains words with the regular patterns for short vowels – namely, VCC , VC#, and a few digraphs. Subrange 1b consists of words that contain only the consonant and long vowel correspondences from Range 1, and the regular patterns for long vowels – VCe#, VCV, and several digraphs. Subrange 2a consists of Range 2 words that contain only the Range 1 and 2 consonant and short vowel correspondences. Subrange 2b consists of words that contain only Range 1 and 2 consonant and long vowel correspondences.

The Range 1 correspondences are these 35:

The Short Vowels:
[a1] = <a> as in pat
[e1] = <e> as in pet
[i1] = <i> as in pit
[o1] = <o> as in pot
[u1] = [u] as in putt

The Long Vowels and Diphthongs:
[a2] = <a...e> as in mate
[e1] = <ee> as in meet
[i2] = <ie> and <i...e> as in pie and pile
[o2] = <oe> and <o...e> as in doe and dote
[u2] = <oo> as in moot
[yu2] = <ue> and <u...e> as in cue and cute
[oi] = <oi> as in foil
[ou] = <ou> as in foul

The Consonants:
[b] = <b> as in bob
[d] = <d> as in dad
[f] = <f> as in fluff
[g] = <g> as in gag
[h] = <h> as in hot
[j] = <j> as in jot
[k] =<c> as in cot
[l1] = <l> as in lot
[m] = <m> as in mom
[n1] = <n> as in nun
[ng] = <ng> as in bring
[p] = <p> as in pop
[r] = <r> as in roar
[s] = <s> as in sis
[t] = <t> as in tot
[v] = <v> as in vine
[w] = <w> as in wine
[y] = <y> as in yip
[z] = <z> as in zip
[ch] = <ch> as in chin
[sh] = <sh> as in shin
[th1] = <th> as in thin

This may seem like a lot of correspondences, but notice that in nearly every case the spelling uses the same letter as we normally use to symbolize the sound. The symbol “...e>” indicates that the long vowel letter is followed by a silent final <e>, which is marking the long vowel sound and can be either right after the vowel letter or separated from it by a single consonant letter. Most of these correspondences are very high frequency. The short <o>, this [o1], collapses the two short low back vowels that are distinguished in Correspondences: [ä], [o4] as in sot, and [], [o1] as in toss. Vowels that precede [r] vary considerably in their pronunciation from that when they precede some other consonant. Consider, for instance, the different pronunciations of <a> in mare and mate. However, since the pronunciation of long <i> and <o> before [r] tend to be very close to their pronunciation when they precede other consonants, I treat them as regular long vowels – thus words like ignore and implore are included in range 1b.

Range 2. The 800+ Range 2 words are completely regular and analyzable if the students have had work with the Range 1 correspondences and the following 33:

The Short and Reduced Vowels:
[e1] = <ea> as in bread
[i1] = <e> as in basket
[o1] = <a> as in ball
[u1] = <o> as in from
[u3] = <oo> as in wood
[u4] (schwa) = <a>

The Long Vowels and Diphthongs:
[a2] = <ai> as in rain
[e2] = <e...e> as in theme
[i2] = <y...e> as in type
[o2] = <oa> as in boat
[u2] = <ue> and <u...e> as in due and dune
[yu2] = <ew> as in few
[oi] = <oy> as in coy
[ou] = <ow> as in coil
[a3r] = <air> as in hair
[o3r] = <or> as in cord

The Consonants:
[b] = <bb> as in ribbon
[d] = <dd> as in ridden
[f] = <ff> as in stuff
[g] = <gg> as in rugged
[j] = <g> as in large
[k] = <k> as in lake
[l1] = <ll> as in tall
[m] = <mm> as in summer
[n1] = <nn> as in runner
[ng] = <n> as in brink
[p] = <pp> as in happy
[r] = <rr> as in marry
[s] = <c> as in cent
[t] = <tt> as in attic
[w] = <u> as in quit
[y] = [i] as in crystal
[z] = <s> as in dogs
[ch] = <tch> as in catch
[sh] = <s> as in sure
[th2] = <th> as in then

It would be good, though not necessary, for the students to have worked with the reasons for double consonant letters: twinning, the assimilation of consonants at the end of prefixes, simple addition, and the VCC tactical pattern.

Range 3. The 1,000+ Range 3 words are completely regular and analyzable if the students have had work with Ranges 1 and 2 with the following correspondences and tactical patterns:

The Vowels:
[a1] = <au> as in laugh
[i1] = <y> as in system
[o1] = <aw> as in law
[u3] = <u> as in put
[a2] = <ay> as in day
[e2] = <ea> as in speak
[o2] = <ow> as in low
[u2] = <o...(e)> as in move
[yu2] = <eu> as in feud
[u4] = <e> as in children
[u4r] = <er> as in batter

The Consonants.
[f] = <gh> as in laugh
[h] = <wh> as in whole
[j] = <d> as in graduate
[k] = <ck> as in pick
[r] = <wr> as in write
[s] = <ss> as in miss
[z] = <zz> as in buzz

In addition to these sixteen correspondences Range 3 words assume that the students have had work with two tactical patterns for long vowels: (i) the stressed head vowels of VCV strings are normally long – for instance, the <a> in bacon spells [a2] , and (ii) vowels at the end of syllables are also regularly long – for instance, the <i> in lion spells [i2]. The first of these two, which is essentially an extension of the Range 1 and 2 correspondences with “...e>”, is discussed in AES as the VCV pattern, the second as the V.V pattern.

Range 4. The 1,000+ Range 4 words are completely regular and analyzable if the students have had work with Ranges 1, 2 and 3 with the following correspondences and tactical patterns:

The Vowels:
[i1] = <a> as in chocolate
[o5r] = <ar> as in hard
[o1] = <au> as in sauce
[u] = <oo> as in blood
[a2] = <ea> as in break
Weak [e2] = <y> as in funny
Strong [e2] = <ie, ei> as in piece, receive
[u2] = <ew> as in drew
[u4] = <io> as in region
[u4l] = <le> as in jungle
[u4r] = <or> as in doctor
[yu4] = <u> as in deputy
[yu3r] = <ur...(e)> as in cure

The Consonants:
[f] = <ph> as in telephone
[j] = <dg> as in judge
[ks] = <x> as in fix
[k] = <q> as in quit
[n1] = <kn> as in know
[r] = <rh> as in rhythm
[s] = <sc> as in scene
[sh] = <t> as in nation

In addition to these eighteen correspondences Range 4 words assume that the students have worked with silent final <e>’s that serve various diacritical functions other than marking long vowels and with silent final <e>’s that serve no diacritical function at all. It also assumes familiarity with the <i>-before-<e> pattern. Exceptions to this pattern with <ei> are included in Range 5.

Range 5.

The Vowels.
[a3r] = <ar...(e)> as in rare
[a1r] = <ar> as in tariff
[a1r] = <arr> as in carriage
[e2] = <ei> not after <c> as in neither
[e2] = <i> as in machine
[u4] = <i> as in horrible
[u4] = <o> as in million
[u4] = <u> as in awful
[u4] = <ou> as in courteous
[u4r] = <ar> as in coward
[u4r] = <ur> as in injury

The Consonants.
[gz] = <x> as in exact
[k] = <cc> as in account
[k] = <ch> as in school
Syllabic [l] = <l> as in battle
[u1r] = <ear> as in earth
[u1r] = <er> as in term
[u1r] = <ir> as in firm
[u1r] = <our> as in courage
[u1r] = <ur...(e)> as in sure
[t] = <ght> as in night
[hw] = <wh> as in why
[ch] = <t> as in feature
[sh] = <c> as in social
[sh] = <ss> as in mission
[zh] = <s> as in casual

Range 5 words also assume some work with the Vcle# long vowel pattern, with the apostrophe, and with non-diacritical, non-final silent <e>’s.

Sound-to-Spelling Correspondences Table

This table contains four fields: (i) Sound and Spelling, which lists the sound-to-spelling correspondences used in the Correspondences field in the CommonWords table; (ii) Examples, which gives an example word for each correspondence; (iii) Instances, which gives the number of words in CommonWords that contain at least one instance of the correspondence; (iv) AES, which cross-references to sections of my American English Spelling dealing with the correspondences , and (v) Sort, which is a number used to set the sort order for the table. The Sort field can also be used to select subsets of correspondences, which are listed in the following order, with the following beginning and ending Sort numbers:

Short Vowels: 1-40
Long Vowels: 41-87
Diphthongs: 88-91
Schwa: 92-111
[r]-Colored Vowels: 112-180
Vowels with initial [y]: 182-191
Consonants: 192-299
Silent Letters: 300-310
Punctuation: 311-314

In the SoundSpelling field, as in the Correspondences field of the CommonWords table, square brackets enclose sounds, arrowhead brackets enclose letters, and the equal sign translates to “is spelled.” Thus, [k]=<c> translates to “the sound [k] is spelled with the letter <c>.” Curly braces mark silent letters: {D} marks silent letters that serve some diacritical function; {ND} marks silent letters with no diacritical function – thus {D}=<e> indicates a diacritical silent <e>, as in time, clothe, ounce, bronze, clause, league, active, while {ND}=<e> indicates a non-diacritical silent <e>, as in fixed.

There is room here for honest differences of opinion, especially in view of the sometimes large differences in pronunciation among various dialects. These differences might be expected to arise particularly with the treatment of schwa, [r]-colored vowels, and vowels with initial [y]. Also syllable boundaries can slide around and raise questions, especially with [r]-colored vowels.

    cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader