|
|
|
|
|
Introduction to the CommonWords DatabasePrintable format
The CommonWords database consists of three data tables: (i) CommonWords, a list of 6200+ high frequency words, (ii) SSCorrespondences, a list of the 300+ sound-to-spelling correspondences found in those 6200+ words, (iii) Themes, a list of 72 themes, or topics, for which in the Commonwords Table words have been tagged. Detailed descriptions of the fields in these three tables follow: CommonWords TableThe CommonWords table contains the following fields, which can be used to filter to specialized word lists: Word. The Word field can be used to filter to words with various letter strings – for instance, “Word contains sh” returns all words with the consonant digraph <sh> anywhere in the word, while “Word ends with sh” returns only those with final <sh>. XP contains the same explications as given in the Words table of the larger Lexis database. For more information on elements given in XP, you should consult the appropriate table in the Lexis database. This XP field can be used to filter to words with various prefixes, bases, suffixes, and procedures. The following are some possible search strings:
Correspondences. The Correspondences field gives all sound-to-spelling correspondences found in each word, in order. Due to font limitations the following substitutions represent sounds otherwise represented by non-ASCII characters:
Square brackets enclose sounds, arrowhead brackets enclose letters, and the equal sign translates to “is spelled.” For instance, at meadow the Correspondences field has the following: “[m]=<m> [e1]=<ea> [d]=<d> [o2] =<ow>”. So if you are dealing with phonics, you can filter to certain correspondences or to certain sounds or to certain spellings. Thus, you can filter to all the words in which short <e> is spelled <ea> (there are currently 66 in CommonWords), or to all words that contain short <e>, [e1], however it's spelled (currently 933), or to the <ea> spelling (currently 220) – sometimes spelling short <e>, sometimes long <e> (as in streak), sometimes long <a> (as in steak), and sometimes schwa (as in ocean ). Analysis. Analysis contains a number of orthographically significant features of each word, each of which can be filtered to:
POS. This part-of-speech field indicates the parts of speech that a word can fill. It uses the following codes:
With several words there is not a perfect match between the phonetic analysis in the Correspondences field and the parts of speech in the POS field. For instance, in the Correspondences field the word alternate is analyzed phonetically with a long <a> in the final syllable, which is its pronunciation as a verb. But when used as a noun or adjective, that vowel is destressed to a short <i>. Nevertheless, in the POS field alternate is tagged as verb, and noun, and adjective. One way of thinking about it is that in the Correspondences field we have to settle on one pronunciation, but in the POS field we can take an inclusive view, including heterophonic senses of the word. Prefixes. Prefixes are listed in two places: (i) the XP field lists any prefixes contained within the listed word, shown with a leading left square bracket; (ii) the Prefixes field lists prefixes that can be added to the listed word. In some cases the word can take a certain prefix only after it has taken certain other affixes. For instance, the word avoid can take the negative prefix un1- only after it has taken the suffix -able, since we do not have the word *unavoid, but we do have unavoidable. A few of the prefixes are numbered to discriminate homographs. To find which prefix each number refers to, see the Prefixes table in the Lexis database. Suffixes. Suffixes are also listed in two places: (i) the XP field lists any suffixes contained within the listed word, with a following right square bracket; (ii) the Suffixes field lists suffixes that can be added to the listed word, with those in parentheses being suffixes that can be added only after the immediately preceding suffix has been added. For instance, the word flesh can take the comparative suffix -er]02 “more” only after it has taken the adjective suffix -y]1: fleshier but not *flesher *”more flesh”. The ends of strings of embedded suffixes are marked with a double right parenthesis. Regular nouns, verbs, adverbs, and adjectives can also add the normal inflectional suffixes, though they are not all listed in the Suffixes column. In some cases different suffixes can be affixed to two different senses of a homographic stem. For instance, at the word camp the suffixes -aign and -er01 suffixes can only be added to camp1 with the sense “Field, temporary dwelling” while -y1 can only be added to camp2 with the sense “Humorous banality”. Many of the suffixes are numbered. To find which suffix each number refers to, see the Suffixes table in the Lexis database. Rank. This field is meant to help some in deciding when to introduce certain words to students. It is based on the Thorndike-Lorge Teacher’s Word Book of 30,000 Words (New York: Teachers College Press, 1944, 1972), which suggests appropriate grade levels. “A” would include “AA”; “B” would include “A” and “AA”, etc. Obviously this Rank and the following Iowa ranking are both quite approximate:
Words with a T-L score of less than 6 are tagged with that score but not assigned to a grade level. Iowa. Suggests the level of difficulty for some of the words in the database, based on the percentages of fourth graders who spelled the given word correctly in The New Iowa Spelling Scale (Iowa City: State University of Iowa, nd). A suggested categorization would be:
Characters. The number of characters (letters, punctuation, and blank spaces) in each word. Syllables. The number of syllables in each word. Themes. In this field over 4800 words are tagged for the various themes, or topics, to which they can be associated. It is intended to be useful for generating word lists dealing with a common theme, such as “colors” or “sports”. There is nothing very authoritative or exhaustive about these groupings. Subjective judgments abound and occasional violence is done to some formal, scientific categories. All I can say is that on at least one day one retired English teacher saw each word belonging to the various themes for which it was tagged. Due to the widespread homography in English, as a given form moves from one theme to another it often becomes a different word. For instance, the form <lime> in the Fruits theme is a homograph of the form <lime> in the Materials theme – that is, an entirely different word with the same spelling. The full list of all 71 themes can be found in the Themes datasheet. In that listing a right parenthesis is used to divide a group name from the tag. Thus, in “Art) Music” the group is Art; the word to the right of the parenthesis is the tag name used in the Themes field of CommonWords. The following comments may be in order. The tag names are in bold face: The Countries theme includes countries, continents and nationalities. The Location theme includes both locations and directions. The Occupation theme includes occupations and titles. The Health theme includes words dealing with health, sickness, and death. Several themes are parts of larger groups: The Animals group consists of four themes: Birds, Insects, Mammals and the more miscellaneous Animals, for everything else of or about an animal nature. Similarly, the Art group consists of four themes: Literature, Music, Visual arts, and plain Art for words that cut across all three types or are difficult to assign to any of the three specific arts. The Feelings group is divided into three themes: feelings or emotions that can in some sense be said to be Negative, those that can be said to be Positive, with more ambiguous or neutral emotions tagged simply Feeling. The Food group is divided into seven themes: Drink, Fruit (including nuts), Grains (including bread), Meat (including dairy products, fish, and poultry), Sweets, Vegetables, and more generally, Food. The Government group is divided into two themes: government People and more general Government. The Math group includes Numbers and more general mathematical terms, tagged Math. The Measure group includes Amount (including sizes), Calculation consisting of calculated measurements, Units of measurement, Value consisting of measurements to which we ascribe subjective values, and Measure. The Military group consists of military Paraphernalia and equipment, military Personnel, and Military. The Science group consists of Biology, Chemistry, Geography (features and places), and Science. The Sports group consists of sports Equipment, sports Persons, and other things dealing with Sports. Range and Subrange. The Range field indicates into which of five ranges each word falls. Ranges are intended to provide help in finding words appropriate to the students’ level of mastery. For instance, the 1,000 plus words in Range 1 are all completely regular and completely analyzable if the students have had work with the Range 1 sound-to-spelling correspondences, which are listed below. The ranges are organized so that each of the first four ranges contains only one spelling for each sound and only one sound for each spelling. This regularity is not true of the correspondences in Range 5, due to the existence of several sounds that have more than five different spellings. Subranges 1a, 1b, 2a, and 2b are subsets of ranges 1 and 2. Subrange 1a consists of Range 1 words that contain only the consonant and short vowel correspondences from Range 1. It contains words with the regular patterns for short vowels – namely, VCC , VC#, and a few digraphs. Subrange 1b consists of words that contain only the consonant and long vowel correspondences from Range 1, and the regular patterns for long vowels – VCe#, VCV, and several digraphs. Subrange 2a consists of Range 2 words that contain only the Range 1 and 2 consonant and short vowel correspondences. Subrange 2b consists of words that contain only Range 1 and 2 consonant and long vowel correspondences. The Range 1 correspondences are these 35:
This may seem like a lot of correspondences, but notice that in nearly every
case the spelling uses the same letter as we normally use to symbolize the
sound. The symbol “...e>” indicates that the long vowel letter is followed by a
silent final <e>, which is marking the long vowel sound and can be either right
after the vowel letter or separated from it by a single consonant letter. Most
of these correspondences are very high frequency. The short <o>, this [o1],
collapses the two short low back vowels that are distinguished in
Correspondences: [ä], [o4] as in sot, and [ Range 2. The 800+ Range 2 words are completely regular and analyzable if the students have had work with the Range 1 correspondences and the following 33:
It would be good, though not necessary, for the students to have worked with the reasons for double consonant letters: twinning, the assimilation of consonants at the end of prefixes, simple addition, and the VCC tactical pattern. Range 3. The 1,000+ Range 3 words are completely regular and analyzable if the students have had work with Ranges 1 and 2 with the following correspondences and tactical patterns:
In addition to these sixteen correspondences Range 3 words assume that the students have had work with two tactical patterns for long vowels: (i) the stressed head vowels of VCV strings are normally long – for instance, the <a> in bacon spells [a2] , and (ii) vowels at the end of syllables are also regularly long – for instance, the <i> in lion spells [i2]. The first of these two, which is essentially an extension of the Range 1 and 2 correspondences with “...e>”, is discussed in AES as the VCV pattern, the second as the V.V pattern. Range 4. The 1,000+ Range 4 words are completely regular and analyzable if the students have had work with Ranges 1, 2 and 3 with the following correspondences and tactical patterns:
In addition to these eighteen correspondences Range 4 words assume that the students have worked with silent final <e>’s that serve various diacritical functions other than marking long vowels and with silent final <e>’s that serve no diacritical function at all. It also assumes familiarity with the <i>-before-<e> pattern. Exceptions to this pattern with <ei> are included in Range 5. Range 5.
Range 5 words also assume some work with the Vcle# long vowel pattern, with the apostrophe, and with non-diacritical, non-final silent <e>’s. Sound-to-Spelling Correspondences TableThis table contains four fields: (i) Sound and Spelling, which lists the sound-to-spelling correspondences used in the Correspondences field in the CommonWords table; (ii) Examples, which gives an example word for each correspondence; (iii) Instances, which gives the number of words in CommonWords that contain at least one instance of the correspondence; (iv) AES, which cross-references to sections of my American English Spelling dealing with the correspondences , and (v) Sort, which is a number used to set the sort order for the table. The Sort field can also be used to select subsets of correspondences, which are listed in the following order, with the following beginning and ending Sort numbers:
In the SoundSpelling field, as in the Correspondences field of the CommonWords table, square brackets enclose sounds, arrowhead brackets enclose letters, and the equal sign translates to “is spelled.” Thus, [k]=<c> translates to “the sound [k] is spelled with the letter <c>.” Curly braces mark silent letters: {D} marks silent letters that serve some diacritical function; {ND} marks silent letters with no diacritical function – thus {D}=<e> indicates a diacritical silent <e>, as in time, clothe, ounce, bronze, clause, league, active, while {ND}=<e> indicates a non-diacritical silent <e>, as in fixed. There is room here for honest differences of opinion, especially in view of the sometimes large differences in pronunciation among various dialects. These differences might be expected to arise particularly with the treatment of schwa, [r]-colored vowels, and vowels with initial [y]. Also syllable boundaries can slide around and raise questions, especially with [r]-colored vowels. |
|
|
|
|