A site for spellers, teachers of spelling and reading, and students of english words
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
 
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader

Introduction to the Lexis Database
2011 Upgrade

At the very least we need a systematic catalog of the elements of American English. Bound bases in particular remain undescribed in any useful way. We need a systematic catalog of them, just as we need a thorough catalog of the functioning sets of coelements in the language. It would be interesting and useful to have full explications of a large sample of the American English lexicon.

American English Spelling, p. 462.

An Overview

The Lexis database is an attempt to address the needs outlined above, especially the final one. It is essentially a lengthy exercise in explication – that is, the analysis of written words into (i) their written parts that contribute semiotic or syntactic sense to them, (ii) various particles and vestiges, (iii) historical processes such as assimilation that have affected their spelling, and (iv) the procedures that spellers must follow in spelling them. The following explication of sufficiently can illustrate: [su/b+f+fic/e1+ient]+ly]1. This explication analyzes sufficiently into the prefix sub-, changed to the spelling <suf> via the historical process of assimilation with the letter <b> – and the sound [b] – being deleted and replaced with <f>, followed by the base +fice1. The number 1 following the bound base +fice indicates that this base is the first of at least two homographic bases spelled <fice>. The virgule indicates that when the expanded suffix -ient was added to suffice, it did so via the procedure of silent final <e> deletion and that when the suffix -ly1 was added to sufficient, it did so via the procedure of simple addition.

Explication speaks to Ferdinand de Saussure’s distinction between the arbitrariness and the relative motivation of the linguistic sign. Saussure divided the sign into the signifier (its expression) and the signified (its content). He makes his point very unequivocably:

The bond between the signifier and the signified is arbitrary. Since I mean by the sign the whole that results from the associating of the signifier with the signified, I can simply say: the linguistic sign is arbitrary. (Course in General Linguistics, 67, his emphasis).

This much has become almost mantric in modern linguistics. However, Saussure went on to draw a distinction between this radical arbitrariness and a more orderly quality that he called motivation:

Some signs are absolutely arbitrary; in others we note, not its complete absence, but the presence of degrees of arbitrariness: the sign may be relatively motivated. (131, again his emphasis).

For example, a simplex word like, say, six is, in Saussure’s view, absolutely arbitrary in its association of expression and content, as is evidenced by the fact that other languages have quite different expressions for conveying the content “six.” However, a complex word like sixteen is not absolutely arbitrary and can be said to be at least relatively motivated because it can be analyzed into two components, six and teen, which he calls syntagms and I call elements. Each of these elements relates sixteen with several other words in the language: Six relates it paradigmatically to sixty, sixth, twenty-six, and so on. Six also relates it via scalar metonymy to seven, seventy, seventh, twenty-seven – and to all other such numbers in the number system. The element teen relates sixteen to such words as thirteen, fourteen, teenage, fifteenth, even teenybopper. More remotely teen “10" relates sixteen to words like thirty and fifty, which contain the base ty1 “times ten”. These paradigmatic relationships provide the orderliness that Saussure calls relative motivation. Saussure says that “motivation varies, being always proportional to the ease of syntagmatic analysis and the obviousness of the meaning of the subunits present” (132). Explication is meant to increase “the ease of syntagmatic analysis” and to heighten “the meaning of the subunits.”

Saussure also argues that

Everything that relates to language as a system must, I am convinced, be approached from this viewpoint, which has scarcely received the attention of linguists: the limiting of arbitrariness. . . . [T]he whole system of language is based on the irrational principle of the arbitrariness of the sign, which would lead to the worst sort of complication if applied without restriction. But the mind contrives to introduce a principle of order and regularity into certain parts of the mass of signs, and this is the role of relative motivation. (133)

It is precisely these effects, “the limiting of arbitrariness” and the concomitant heightening of motivation, that explication works to increase in the written lexicon, as part of the search for increased order and regularity.

Explication and some of its problems are discussed in chapter two of my American English Spelling (Johns Hopkins, 1988). These issues are discussed further in the article “Explication, Evolution, and Orthography” in the Short Articles section of this website. Some of the major problems with the practice of explication are discussed in the final section of that article, “Problems of Explication.”

The 2011 Version of Lexis

When I explicated the words in the original version of Lexis, I was most concerned with an economy that emphasized two things: (i) minimizing the number of parts and (ii) minimizing the number of complex procedures involved in word-formation – a complex procedure being defined as any procedure other than simple addition. Thus, I tended to avoid explications that required silent final <e> deletion other than in the inflection of free stems that ended in silent final <e> – as in, say, fired fir/e+ed]1. I also tried to avoid running counter to history, trying, for instance, to reflect what is taken to be a word’s original formation. Also, minimizing the number of parts led to much merging of bases with adjacent particles and vestiges. Over time I grew uneasy with that whole approach: All of that merging, in particular, tended to obscure the identity of bases, which provide the semiotic core of words. My approach to language is at heart phenomenological, concerned with how language is experienced by its users. Thus, in this 2011 version I’ve emphasized economy and history less and the salience of bases more. I have become more willing to make use of unhistorical explications – that is, explications that run counter to etymology: For instance, given a pair of coforms – one with silent final <e>; the other identical but without the final <e> – the major principle is this: If explicating to the coform with silent final <e> is orthographically justified, do so. Only explicate to the form without final <e> if you cannot justify a silent final <e> deletion. The form with silent final <e> is privileged because it is usually terminative and thus usually has end-focus and emphasis. The form without final <e> is usually either initial or medial, where it receives less emphasis.

Thus, in the {phote, phot} set of co-forms we have photograph now explicated to phot/e+o4+graph, photo explicated to phot/e+o]3, and photic to phot/e+ic]1, even though there is no free form *phote and from the historical point of view there was quite surely no <e> deletion involved in the formation of any of those words. On the other hand, isophot must explicate to is2+o4+phot, since there would be no way orthographically to justify <e> deletion – that is, no *is2+o4+phot/e.

Deleting silent final <e>’s before particles like o4 – as in phot/e+o4+graph – requires a modest revision of the usual final <e> deletion rule, which restricts <e> deletion to cases where one is adding a suffix that starts with a vowel to a free stem ending in silent final <e>. (For more details see chapter eight of my American English Spelling, especially section 8.2.)

My rather generous expansions of suffixes continues the process mentioned in "Note of Suffixes" in Glare's The Oxford Latin Dictionary: "One of the most characteristic features of Latin suffixes is their growth by misdivision: for instance, the elementary suffix -nus gives rise to a group of secondary suffixes -ânus, -înus, -ernus, -tinus" (xxiii). Glare's use of the word misdivision is interesting: I would describe it as the natural change and growth of the language, driven primarily, I suspect, by the contention between element and syllable boundaries. Beyond the common expansion of suffixes and the less common expansion of prefixes, I have resisted expanding bases, leaving independent those particles and vestiges that follow bases and don’t attach to expanded suffixes – an approach that greatly increases the number of plus signs in some explications and also puts off to a later date the question of what, if anything, to about those particles and vestiges.

The Four Lexis Data Tables

The Lexis database contains four data tables. The first, Words, contains the lexicon of 129,029 words, each with its explication, as illustrated above with sufficiently. The second table, Bases, contains the 16,980 free and bound bases contained in those explications. The third table, Prefixes, contains the 245 prefixes from the explications in Words, and the fourth, Suffixes, contains the 1,197 suffixes. More detailed introductions to the tables are given below.

Using Lexis

One typical use of Lexis would be to find a word's explication in the Words table and then, shifting to the other tables, finding more about its constituent elements, processes, and procedures. If, on the other hand, you are interested not in elements but in simple letter strings, you could, for instance, search for the string <mpt> in the Word field, which would return a set of 145 words, from ademption to unkempt. If you were interested only in words in which <mpt> is in a single element, typing <mpt> in the Explication field would return 114 words from ademption to transumption.

In the Comment fields in the Bases, Suffixes, and Prefixes tables there are key words that can provide some informative searches and that are listed below for the separate tables The Prefixes, Bases, and Suffix tables can be sorted on the Instances field to isolate low and high frequency forms.

Introduction to the Words Table

The only fields in the Words table are Word and Explication. The collection of words, though quite extensive, is meant to be a sample. One special point: Many of the words in Lexis have homographic forms, of which Words includes only a few. A second point: Lexis contains all the words from the Index of Words of my American English Spelling, which means that it contains an unusual number of words with spellings that are odd or peripheral to our English spelling system, such as ngaioI, a word borrowed from Maori referring to a type of small New Zealand tree.

The following symbols are used in the Explications field: Plus marks indicate internal boundaries between elements, vestiges, and particles. Virgules indicate that the following letter is to be deleted. Left square brackets mark the beginning of prefixes; right square brackets mark the end of suffixes. Numbers following elements and particles discriminate homographic forms.

Introduction to the Bases Table

The Bases table contains the following seven fields: (i) Bases, (ii) Examples, (iii) Instances, (iv) Free, (v) Sense Links, (vi) Comments, and (vii) Relatives.

(i) and (ii). The Base and Examples fields are pretty much self-explanatory, except that the Base field also lists particles, which are tagged “Particle” in the Comment field. And the Examples field lists some (often all) of the words in Lexis that contain that base.

(iii) The Instances field gives the number of words in Lexis in which the base in question appears. A caveat: As changes were made in the Words and Bases tables, the counts changed, and it’s likely that some of the listed counts are off a bit. They are accurate enough for discussions of general patterns and relative frequency, but if you need precise figures, I suggest filtering the Words table on the base in question to double-check the count.

(iv) The Free field tags free bases. Untagged bases are bound. The tagging was determined by filtering to words in the Words table whose explications do not include a plus sign marking an internal boundary.

(v) The Sense Links field summarizes the various senses carried by that base throughout its history, often running back to its proposed Proto-Indo-European (PIE) root, It does not attempt systematically to present the evolution of those senses. Rather, it tries to show some of the senses that have been associated with words containing the base over the centuries. Thus, it suggests how various metaphoric and metonymic relationships have produced the various senses. (For more on metaphor and metonymy, see the article “Explication, Evolution, and Orthography” in the Short Articles section of this website.)

Searching for a semicolon in the Sense Links field returns all bases assumed to have evolved from a PIE root. The senses of the roots are given to the left of the semicolons in the Sense Link field. These assumed senses present a major complication : Primitive languages tend to be concrete and specific affairs, with abstraction and generality developing later. But in the comparative method used to reconstruct the senses of PIE roots, the need to find common themes that link cognate words from often widespread languages leads to proposed senses for roots that are often much more vague and diffuse, much more general and abstract than the terms actually must have been in PIE. In spite of this probably ahistoric generality, the assumed PIE senses can help uncover the metaphoric and metonymic links that concern us.

(vi) The Comments field contains brief notes on that base’s structure, such as the contraction and expansion of earlier forms. In Comments the keyword Imitative covers a multitude of types. Sometimes it’s a straightforward imitation of a natural sound, such as in moo and caw, or oh and ooh. Sometimes it is not always clear what is being imitated, as with jab and jam, where it is perhaps more like sound symbolism or phonaesthesia. Other keywords in the Comments field : Particle, Reduplicative, Folk ety(mology), Contracts, Expands, Alters, Varies, Nonterm(inative) and Term(inative) coform, Eponym, Merges, Redivides, ooo (Of Obscure Origin), Past (tense), Past participle, Archaic, Coined by, Converts, Trademark, Plural; and the source languages Spanish, French, Anglo-Norman, British, Scots, Dutch, Arabic, Chinese, Hindi, Swedish, Norwegian, Finnish, Italian, Portuguese, Latin, Greek. The Comments field also contains a few warnings about unusual deletions, tagged with the keyword Check – as in “Check +vi/e & +v/i/e” (at vie), necessary to catch the inflected forms vied and vying.

I’m sensitive to the fact that the English lexicon and its morphology are evolving complex systems, so an important part of explication is the attempt to capture some sense of the direction that things are taking. Much of this attempt can be seen by filtering on the keywords Expands and Contracts in the Comment field. In the lists that are returned will be many cases in which historical bases have been expanded by the accretion of vestiges from earlier stems or contracted via the metonymic part-for-whole relationship, especially in words from the scientific-technical register.

(vii) The Relatives field describes unifying links among the base in question and various other elements. Assumed PIE roots for bases are indicated in the Relatives field, preceded by an asterisk, as in *skep. A few of these roots are actually Latin, Greek, or Germanic roots, not Proto-Indo-European.

It is important to remember that the PIE roots are all reconstructions, arrived at by comparing words in the various languages assumed to have descended from the mother tongue and applying the rules of sound change that grammarians have developed. No direct traces of PIE exist. All of the roots are assumptions. In his American Heritage Dictionary of Indo-European Roots Calvert Watkins uses the word perhaps often, as is sometimes reflected in Bases via question marks. As in any form of archaeology, etymological conclusions are often based on little hard evidence, and conclusions and assumptions can change drastically with the discovery of new evidence. Thus, there can be, and is, considerable disagreement, not only about whether a given modern base descends from a certain PIE root, but also about the form and semiotic content of that root (and at times its very existence). In compiling the list of roots for the Relatives field, I’ve taken what might be called a “loose constructionist” approach – that is, since I am more interested in finding plausible unifying links than in determining certain true cognates, I have assumed that if one source finds a certain link plausible, even though others do not, I tend to include it.

In the Relatives field, due to font limitations, some special characters have substitute symbols: Schwa is represented with the “at” sign, @; long vowels are represented with capital letters. The superscript numerals used by Watkins to distinguish among homographic PIE roots are here represented with regular numerals.

The Relatives field, besides giving roots when available, suggests some of the other elements that are related to the current base. The related elements are marked with leading plus signs, which can be thought of as substitutes for italics. The treatment does not pretend to be exhaustive, whatever exhaustive might mean here. Of course, all bases with the same PIE root are assumed to be related, so bases returned by a search in the Relatives field for a specific root are assumedly related to one another to some degree, and in general the more similar are the Senses fields, the more closely related are the bases. To pursue these relationships further you should consult the family trees provided by Watkins, either in his The American Heritage Dictionary of Indo-European Roots or in his Indo-European appendices to the 1st, 3rd, and 4th editions of the American Heritage Dictionary.

In the Relatives field, the references are usually to other bases. When prefixes are listed – for instance, at pert1 there is “+[a4", the prefix [a4- in the Prefixes table – that means that there is at least one word containing that base that is an aphetic form of an earlier word with that prefix, as pert1 itself, which is from ME apert [a4+pert. There are a number of words that have lost an initial euphonious e4 particle, which was added in French and Spanish because those languages tend to avoid the initial clusters <sc>, <sp>, <sq>, and <st>. Additional instances of lost e4 can be found by filtering the Bases field to bases with initial <sc>, <sp>, <sq>, or <st> and examining the returned bases to exclude any that are not French or Spanish adaptations.

Acknowledgements

This is very much a derivative study, based on the primary research of others: James Murray, Henry Bradley, Calvert Watkins, Julius Pokorny, Eric Partridge, and the editors of The Barnhart Dictionary of Etymology. Although the explications are my own work, much of the Bases table, especially the Sense Links and much of the Relatives fields, gathers together findings from many earlier studies.

The great majority of PIE roots are drawn from Calvert Watkins’ The American Heritage Dictionary of Indo-European Roots (Boston: Houghton Mifflin, 1985). A few are drawn from the slightly different and shortened versions of Watkins’ list given as appendices to the 1st, 3rd, and 4th editions of the American Heritage Dictionary. Even fewer are drawn from The Barnhart Dictionary of Etymology (H. H. Wilson, 1988). A few have come from Partridge’s Origins, fewer from the interactive version of Julius Pokorny’s seminal Indogermanisches Etymologisches Worterbook (Bern, 1959), available at Leiden University’s website http://www.indo-european.nl/.

Those roots from Watkins are listed with no annotation beyond the root itself and its presumed sense recorded in the Sense field, followed by a semicolon. Those from Barnhart have the Pokorny page number, as in “(Pok 345)". Those from Partridge are labeled “Partridge,” and those few from Leiden are labeled “Leiden.” Some of those from the other sources involve roots listed by Watkins but without the bases and words in question being listed as descendants in Watkins, though they are so grouped in Partridge, even if he does not directly mention any PIE root.

Introduction to the Prefixes Table

The Prefixes table contains the 245 prefixes found in the explication of the 129,029 words in Words. It is a rather long and eclectic list that includes, among other things, some prefixes from very rare adoptions – such as the plural markers ema- (emalangeni) and ma- (makota ) from Africa, and the noun markers mi- (mikado) and sa1- (samurai) from Japanese. Such prefixes are obviously quite peripheral to the English prefix system, but there they are in their adopted words. Since the primary motive of explication is to highlight potential unifying links in the English lexicon, it seems best to explicate to these alien prefixes, rare and exotic though they may be.

In addition to the Prefix field, the Prefixes table includes the regular Examples, Instances, and Comment fields. In the Comment field, since many prefixes carry considerable semiotic content, their senses are given, in quotation marks: [ultra-“Beyond, beyond the norm.” The Comment field also contains a number of keywords: Contracts, Alters, Expands, Varies, and Marks and Forms, which show syntactic function. An especially important keyword is Assimilation, which tags assimilated forms of several prefixes. The Comment field also contains the keyword Search, followed by the syntax for filtering to given assimilated forms – for example, at [ac1-, you find “Search [a/d+c+.” Finally, the Comment field lists a number of source languages and country names: Old English, German, Latin, French, Spanish, Greek, Italian, Romance, Russian, Japanese – and from Africa: African, Swaziland, Bemba, Bantu, Zaire, Lesotho.

Definition of Prefixes

In his English Word Formation Hans Marchand uses a rather restrictive definition of prefix (“Prefixes are bound morphemes which are preposed to free morphemes” [129]) and discusses only 65 (129-208). At the other extreme, in his Origins Eric Partridge uses Webster’s much less restrictive definition of prefix: “One or more letters or sylables combined or united with the beginning of a word to modify its signification, as pre- in prefix, con- in conjure” (821). Partridge goes on to say that “Strictly, a prefix should consist of either a preposition or adverb”, and he omits what he calls the “false prefixes of science”, which he describes as not prefixes, but abbreviations. Still, he lists more than 340 prefixes, including many homographic and variant forms – for example, he lists seventeen different prefixes spelled <a>.

So the question of what a prefix is remains somewhat undecided. Dictionaries do not agree on the distinction between prefixes and bases, especially those usually bound bases called “combining forms.” For instance, though the AHD uses electr+ as an example of a combining form (at “combining form”), in the main word list it is labeled “prefix,” as are all other combining forms. On the other hand, the RHUD and W3 both distinguish carefully between prefixes and combining forms. At a different extreme the editors of Prefixes: and Other Word-Initial Elements of English collapse the distinction completely, speaking only of “word-initial elements” in their list of nearly 3,000 forms.

Elements signifying numerical values can illustrate the confusion: In W3 bi- “two” is labeled a prefix, but tri- “three” is a combining form. RHUD labels both as combining forms; AHD labels both as prefixes. For the sake of simplicity, I treat all numerical elements as combining forms – that is, bases, usually bound – and restrict prefixes essentially to prepositions (in2-, ad-), negatives (in1-, non-, un1-), adverbs (se-, per1-), a few derivationals, (en1- and be-), and even fewer inflectionals, such as those in samurai and mikado.

There can be indecision whether the Romance euphonic <e>, as in escalate, is an initial particle or a prefix. Arguing against the former is the fact that all other particles are medial, not initial. Arguing against the latter is the fact that euphonic <e> does not have any semiotic content. For now I explicate the euphonic <e> as a particle, listed as e4 in the Bases table and tagged in the Comment field as “Particle.”

Converted Prefixes

The practice of synecdoche and conversion in English has led to several cases in which one-time prefixes have been converted to bases. Some examples: The base com5 contracts communication (as in intercom), and com7 contracts commissioned (as in noncom). Both convert the prefix [com- to a base. In abo ab2+o]3 ab2 is clipped from aborigine [ab1+orig+in]02+e]3. Selsyn sel4+syn1 contracts “sel(f) + syn(chronous)” [syn+chron+ous]. In cistron cistr+on]08 the base cistr contracts the phrase “cis-trans test”. Since [cis- and [trans- are both prefixes, the bound base cistr is another example of conversion. The very common free base pro1 converts the prefix pro1- in professional.

Often the prefix merges with part or all of the following base in the conversion: In propane prop2+ane]3 the base prop2 contracts propionic [pro2+pion2+ic]1. In praetor prae+tor], pretor pre+tor] and their derivatives the prae+ and pre+ convert the Latin prefix pre- to a base and absorb the <e> from a now lost Latin base meaning “go”. Combo comb3+o]3 contracts combination, with the prefix com-. Similar examples: comfy, commie, comp3, condo. Pregnant pregn+ant]1 is ultimately from a Latin source that contains the prefix [prae- plus the source of our modern base gn2 “born”. Related are slang terms such as preggers, preggy, and preggo.

In the other direction, bases are sometimes converted to prefixes, as is apparently the case in sovereign, which I explicate as [sove+reign. The word has evolved from ME soverain, influenced by reign . The ME soverain is from OF soverain, from Vulgar Latin *superanus, which would be the Latin super “above” plus the adjective suffix -anus. Sovereign is a good example of folk etymology: As the final syllable was respelled <reign>, the respelled Romance suffix -ain was converted from suffix to base. It would also follow that the first syllable, which started out as a base has converted to a prefix, a variant of the English prefix super-.

The explication of prefixes that descend from the Latin preposition ad poses some questions. At one extreme are those words in which the Latin preposition had already become used as an assimilated prefix in Latin, words like accent and accident. At the other extreme are words from French phrases with the contracted French preposition a , with no double consonant: abase, abate, abut. The former I explicate as [a/d+; the latter as [a3+. Problems arise with words between these extremes, with words like acclimate, which developed the <cc> in French, and accompany, which developed it in English. In such words there never was an assimilation of ad-. But rather than positing a separate prefix ac-, I recognize the power of analogy and folk etymology and treat such words as if they were assimilated forms of ad-. Basically, if there is a double consonant, or its equivalent, I'm inclined to explicate to [a/d+-, motivated by analogy. If there is no double consonant, I'm inclined to explicate to the closely related [a3+.

Introduction to the Suffixes Table

The Suffixes table contains the 1197 suffixes explicated out in Words.

In the Comment field there are a number of potentially useful search and filter strings. The following are the more common for the country of origin: Greek, Latin, GrecoRom (which equals GrecoRomance, which is Greek, Latin, and modern Romance languages), Rom(ance), Italian, Spanish, Portuguese, Germanic, German, English, British, OE (Old English), Scandinavian, Russian, Slavic, Hebrew, Yiddish, Arabic, Semitic, Hindi, Japanese.

In what might be called a “processes” group there are the following: Contracts, Expands, Varies, Marks, Vestige, Term(inative) and Nonterm(inative), Coform, Converges, Combines. Searching on Cf returns suffixes with close relationships to other suffixes.

There are several possible strings dealing with grammar: noun, verb, adj(ective), past (tense), pres(ent participle) , part(iciple), and many others. Related are agent, instrument, frequentative, diminutive, augmentative, comparative, superlative, pejorative, plural, etc.

In the register group are chemical, scientific, technical, jocular, familiar, informal, intimate.

I tend to explicate so as to recover as much as possible of potentially motivating material from the words. One result of this is that sometimes explicating to a base leads to suffixes that are quite exotic and peripheral, such as the following from Hebrew: -o7, -oh, -os3, -ot6, -oth, -u5.

Nonterminative and Vestigial Suffixes

The notion of nonterminative suffixes may at first seem odd. They tend to occur as nonterminative coforms in sets like {-abil]+, -able]} and {-os2]+, -ous]}: availability, available; generosity, generous. Also, vestiges from earlier, primarily Latin, stems often are explicated as nonterminative suffixes. For instance, -it5, comes from Latin stems and is a good example of evolutionary recycling: authoritative auth+or]2+it]5+ative], dietitian diet+it]5+ian], and puritan pur5+it]5+an]1. The Suffixes table contains 114 nonterminative and vestigial suffixes.

A Note on the Sequence +ic]1+al]1+ly]1

This common suffix sequence illustrates some of the complexities involved in explication. In the Words table 401 words contain the full sequence – as in canonically canon+ic]1+al]1+ly]1, which grows from the series canon, canonic , canonical. 0000 But there are more than 480 words in Lexis that end <ically> but for which there are no shorter forms ending in -al1. For instance, we have hydraulic and hydraulically , but no *hydraulical. (The OED does list hydraulical, but labels it obsolete and shows only three citations, stretching from 1664 to 1792.) There are also fifteen words in Lexis that end <ically> but have no shorter form ending in -ic1. For instance, we have farcical and farcically, but no *farcic. (Again, the OED lists farcic, but labels it obsolete and rare and lists only one citation, from 1763.) (There are also more than 464 words that end with +al]1+ly]1, with no preceding +ic1, such as morally.)

The missing forms with final -al1 are due to two features of our lexicon: First, it is regular in English when adding -ly1 to a word ending in -ic1 to insert <al> even if there is no intermediate form ending in -al1, as is the case with hydraulic and hydraulically. And second, though sometimes the two adjectives ending in -ic1 and in -ic1 plus -al1 have slightly different senses, such as historic and historical, usually they are synonymous, in which case the longer form disappears.

Rather than explicating all words that end <ically> consistently as +ic]1+al]1+ly]1, I use two expanded forms to accommodate those missing intermediate forms: -ical and -ally, as in farcical farc+ical] and hydraulically hydr+aul+ic]1+ally]. Also, since dictionaries do not always agree whether certain forms exist, and since users are free to coin missing forms to fill emergent needs, I’ve decided to set the universe of inclusion at the Words table itself – that is, if a form is not included in Words, for the sake of this analysis, it does not exist.

* * * * * *

Available Text Files

The on-line version of the database has limited filtering and sorting capabilities. If you need to do more elaborate filtering and sorting – using Boolean operators, for instance – feel free to download the text files of the tables and load them into your database or spreadsheet program. (Download zipped tables: 1.54 MB)

The only requirement is that any public use of the Lexis database be publicly acknowledged and documented.

A Work in Progress

It is important to realize that the Lexis database can only be seen as very tentative. My hope is that later work will lead to more formal and systematic ways of answering the question this work raises. Changes and corrections are expected and encouraged, and the word tentative in the first sentence of this paragraph is carefully and deliberately chosen.

In a project of this size, one that has stretched over so many years, a major problem is maintaining consistency of analysis. If in using Lexis you discover inconsistencies of any kind, please point them out to me at englishspelling@charter.net. Beyond that, there is an immense potential for all kinds of errors, from pesky typos to just plain mistaken explications. I would appreciate hearing about any of those you might discover. As inconsistencies and errors are caught and corrected, we plan to update the tables on the site. I would also appreciate any suggestions or comments.

    cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader
cummings, spell, spelling, english, words, spellers, teachers, reading, read, reader