 |
|
 |
 |
Introduction to the Lexis Database
2011 Upgrade
At the very least we need a systematic
catalog of the elements of American English. Bound bases in particular
remain undescribed in any useful way. We need a systematic catalog of
them, just as we need a thorough catalog of the functioning sets of
coelements in the language. It would be interesting and useful to have
full explications of a large sample of the American English lexicon.
American English Spelling,
p. 462.
An Overview
The Lexis database is an attempt to address the
needs outlined above, especially the final one. It is essentially a lengthy
exercise in explication – that is, the analysis of written words into (i) their
written parts that contribute semiotic or syntactic sense to them, (ii) various
particles and vestiges, (iii) historical processes such as assimilation that
have affected their spelling, and (iv) the procedures that spellers must follow
in spelling them. The following explication of sufficiently can
illustrate: [su/b+f+fic/e1+ient]+ly]1. This
explication analyzes sufficiently into the prefix sub-, changed to the spelling <suf>
via the historical process of assimilation with the letter <b> – and the sound
[b] – being deleted and replaced with <f>, followed by the base +fice1.
The number 1 following the bound base +fice indicates that
this base is the first of at least two homographic bases spelled <fice>. The
virgule indicates that when the expanded suffix -ient was added to
suffice, it did so via the procedure of silent final <e> deletion and that
when the suffix -ly1 was added to sufficient, it did so via
the procedure of simple addition.
Explication speaks to Ferdinand de Saussure’s distinction between the
arbitrariness and the relative motivation of the linguistic sign. Saussure
divided the sign into the signifier (its expression) and the signified (its
content). He makes his point very unequivocably:
The bond between the signifier and the signified is arbitrary.
Since I mean by the sign the whole that results from the associating of the
signifier with the signified, I can simply say: the linguistic sign is
arbitrary. (Course in General Linguistics, 67, his emphasis).
This much has become almost mantric in modern linguistics. However, Saussure
went on to draw a distinction between this radical arbitrariness and a more
orderly quality that he called motivation:
Some signs are absolutely arbitrary; in others we note, not
its complete absence, but the presence of degrees of arbitrariness: the sign
may be relatively motivated. (131, again his emphasis).
For example, a simplex word like, say, six is, in Saussure’s view,
absolutely arbitrary in its association of expression and content, as is
evidenced by the fact that other languages have quite different expressions for
conveying the content “six.” However, a complex word like sixteen is
not absolutely arbitrary and can be said to be at least relatively motivated
because it can be analyzed into two components, six and teen,
which he calls syntagms and I call elements. Each of these elements relates
sixteen with several other words in the language: Six relates it
paradigmatically to sixty, sixth, twenty-six, and so
on. Six also relates it via scalar metonymy to seven,
seventy, seventh, twenty-seven – and to all other such
numbers in the number system. The element teen relates sixteen
to such words as thirteen, fourteen, teenage,
fifteenth, even teenybopper. More remotely teen “10"
relates sixteen to words like thirty and fifty, which
contain the base ty1 “times ten”. These paradigmatic relationships
provide the orderliness that Saussure calls relative motivation. Saussure says
that “motivation varies, being always proportional to the ease of syntagmatic
analysis and the obviousness of the meaning of the subunits present” (132).
Explication is meant to increase “the ease of syntagmatic analysis” and to
heighten “the meaning of the subunits.”
Saussure also argues that
Everything that relates to language as a system must, I am
convinced, be approached from this viewpoint, which has scarcely received the
attention of linguists: the limiting of arbitrariness. . . . [T]he whole system
of language is based on the irrational principle of the arbitrariness of the
sign, which would lead to the worst sort of complication if applied without
restriction. But the mind contrives to introduce a principle of order and
regularity into certain parts of the mass of signs, and this is the role of
relative motivation. (133)
It is precisely these effects, “the limiting of arbitrariness” and the
concomitant heightening of motivation, that explication works to increase in the
written lexicon, as part of the search for increased order and regularity.
Explication and some of its problems are discussed in chapter two of my
American English Spelling (Johns Hopkins, 1988). These issues are discussed
further in the article “Explication, Evolution, and Orthography” in the Short
Articles section of this website. Some of the major problems with the practice
of explication are discussed in the final section of that article, “Problems of
Explication.”
The 2011 Version of Lexis
When I explicated the words in
the original version of Lexis, I was most concerned with an economy that
emphasized two things: (i) minimizing the number of parts and (ii) minimizing
the number of complex procedures involved in word-formation – a complex
procedure being defined as any procedure other than simple addition. Thus, I
tended to avoid explications that required silent final <e> deletion other than
in the inflection of free stems that ended in silent final <e> – as in, say,
fired fir/e+ed]1. I also tried to avoid running counter to history, trying,
for instance, to reflect what is taken to be a word’s original formation. Also,
minimizing the number of parts led to much merging of bases with adjacent
particles and vestiges. Over time I grew uneasy with that whole approach: All of
that merging, in particular, tended to obscure the identity of bases, which
provide the semiotic core of words. My approach to language is at heart
phenomenological, concerned with how language is experienced by its users. Thus,
in this 2011 version I’ve emphasized economy and history less and the salience
of bases more. I have become more willing to make use of unhistorical
explications – that is, explications that run counter to etymology: For
instance, given a pair of coforms – one with silent final <e>; the other
identical but without the final <e> – the major principle is this: If
explicating to the coform with silent final <e> is orthographically justified,
do so. Only explicate to the form without final <e> if you cannot justify a
silent final <e> deletion. The form with silent final <e> is privileged because
it is usually terminative and thus usually has end-focus and emphasis. The form
without final <e> is usually either initial or medial, where it receives less
emphasis.
Thus, in the {phote, phot} set of co-forms we have photograph now explicated to
phot/e+o4+graph, photo explicated to
phot/e+o]3, and photic to
phot/e+ic]1, even
though there is no free form *phote and from the historical point of view there
was quite surely no <e> deletion involved in the formation of any of those
words. On the other hand, isophot must explicate to
is2+o4+phot, since there
would be no way orthographically to justify <e> deletion – that is, no
*is2+o4+phot/e.
Deleting silent final <e>’s before particles like o4 – as in
phot/e+o4+graph –
requires a modest revision of the usual final <e> deletion rule, which restricts
<e> deletion to cases where one is adding a suffix that starts with a vowel to a
free stem ending in silent final <e>. (For more details see chapter eight of my
American English Spelling, especially section 8.2.)
My rather generous expansions of suffixes continues the process mentioned in
"Note of Suffixes" in Glare's The Oxford Latin Dictionary: "One of the most
characteristic features of Latin suffixes is their growth by misdivision: for
instance, the elementary suffix -nus gives rise to a group of secondary suffixes
-ânus, -înus, -ernus, -tinus" (xxiii). Glare's use of the word misdivision is
interesting: I would describe it as the natural change and growth of the
language, driven primarily, I suspect, by the contention between element and
syllable boundaries. Beyond the common expansion of suffixes and the less common
expansion of prefixes, I have resisted expanding bases, leaving independent
those particles and vestiges that follow bases and don’t attach to expanded
suffixes – an approach that greatly increases the number of plus signs in some
explications and also puts off to a later date the question of what, if
anything, to about those particles and vestiges.
The Four Lexis Data Tables
The Lexis database contains four data tables. The
first, Words, contains the lexicon of 129,029 words, each with its explication,
as illustrated above with sufficiently. The second table, Bases, contains the
16,980 free and bound bases contained in those explications. The third table,
Prefixes, contains the 245 prefixes from the explications in Words, and the
fourth, Suffixes, contains the 1,197 suffixes. More detailed introductions to
the tables are given below.
Using Lexis
One typical use of Lexis would be to find a word's explication in
the Words table and then, shifting to the other tables, finding more about its
constituent elements, processes, and procedures. If, on the other hand, you are
interested not in elements but in simple letter strings, you could, for
instance, search for the string <mpt> in the Word field, which would return a
set of 145 words, from ademption to unkempt. If you were interested only in
words in which <mpt> is in a single element, typing <mpt> in the Explication
field would return 114 words from ademption to transumption.
In the Comment fields in the Bases, Suffixes, and Prefixes tables there are key
words that can provide some informative searches and that are listed below for
the separate tables The Prefixes, Bases, and Suffix tables can be sorted on the
Instances field to isolate low and high frequency forms.
Introduction to the Words Table
The only fields in the Words table are Word and Explication. The collection of
words, though quite extensive, is meant to be a sample. One special point: Many
of the words in Lexis have homographic forms, of which Words includes only a
few. A second point: Lexis contains all the words from the Index of Words of my
American English Spelling, which means that it contains an unusual number of
words with spellings that are odd or peripheral to our English spelling system,
such as ngaioI, a word borrowed from Maori referring to a type of small New
Zealand tree.
The following symbols are used in the Explications field: Plus marks indicate
internal boundaries between elements, vestiges, and particles. Virgules indicate
that the following letter is to be deleted. Left square brackets mark the
beginning of prefixes; right square brackets mark the end of suffixes. Numbers
following elements and particles discriminate homographic forms.
Introduction to the Bases Table
The Bases table contains the following seven fields: (i) Bases, (ii) Examples,
(iii) Instances, (iv) Free, (v) Sense Links, (vi) Comments, and (vii) Relatives.
(i) and (ii). The Base and Examples fields are pretty much self-explanatory,
except that the Base field also lists particles, which are tagged “Particle” in
the Comment field. And the Examples field lists some (often all) of the words in
Lexis that contain that base.
(iii) The Instances field gives the number of words in Lexis in which the base
in question appears. A caveat: As changes were made in the Words and Bases
tables, the counts changed, and it’s likely that some of the listed counts are
off a bit. They are accurate enough for discussions of general patterns and
relative frequency, but if you need precise figures, I suggest filtering the
Words table on the base in question to double-check the count.
(iv) The Free field tags free bases. Untagged bases are bound. The tagging was
determined by filtering to words in the Words table whose explications do not
include a plus sign marking an internal boundary.
(v) The Sense Links field summarizes the various senses carried by that base
throughout its history, often running back to its proposed Proto-Indo-European
(PIE) root, It does not attempt systematically to present the evolution of those
senses. Rather, it tries to show some of the senses that have been associated
with words containing the base over the centuries. Thus, it suggests how various
metaphoric and metonymic relationships have produced the various senses. (For
more on metaphor and metonymy, see the article “Explication, Evolution, and
Orthography” in the Short Articles section of this website.)
Searching for a semicolon in the Sense Links field returns all bases assumed to
have evolved from a PIE root. The senses of the roots are given to the left of
the semicolons in the Sense Link field. These assumed senses present a major
complication : Primitive languages tend to be concrete and specific affairs,
with abstraction and generality developing later. But in the comparative method
used to reconstruct the senses of PIE roots, the need to find common themes that
link cognate words from often widespread languages leads to proposed senses for
roots that are often much more vague and diffuse, much more general and abstract
than the terms actually must have been in PIE. In spite of this probably
ahistoric generality, the assumed PIE senses can help uncover the metaphoric and
metonymic links that concern us.
(vi) The Comments field contains brief notes on that base’s structure, such as
the contraction and expansion of earlier forms. In Comments the keyword
Imitative covers a multitude of types. Sometimes it’s a straightforward
imitation of a natural sound, such as in moo and caw, or oh and
ooh. Sometimes
it is not always clear what is being imitated, as with jab and jam, where it is
perhaps more like sound symbolism or phonaesthesia. Other keywords in the
Comments field : Particle, Reduplicative, Folk ety(mology),
Contracts, Expands,
Alters, Varies, Nonterm(inative) and Term(inative)
coform, Eponym, Merges, Redivides, ooo (Of Obscure Origin),
Past (tense), Past participle, Archaic,
Coined by, Converts, Trademark, Plural; and the source languages
Spanish,
French, Anglo-Norman, British, Scots, Dutch,
Arabic, Chinese, Hindi, Swedish,
Norwegian, Finnish, Italian, Portuguese, Latin,
Greek. The Comments field also
contains a few warnings about unusual deletions, tagged with the keyword Check –
as in “Check +vi/e & +v/i/e” (at vie), necessary to catch the inflected forms
vied and vying.
I’m sensitive to the fact that the English lexicon and its morphology are
evolving complex systems, so an important part of explication is the attempt to
capture some sense of the direction that things are taking. Much of this attempt
can be seen by filtering on the keywords Expands and Contracts in the Comment
field. In the lists that are returned will be many cases in which historical
bases have been expanded by the accretion of vestiges from earlier stems or
contracted via the metonymic part-for-whole relationship, especially in words
from the scientific-technical register.
(vii) The Relatives field describes unifying links among the base in question
and various other elements. Assumed PIE roots for bases are indicated in the
Relatives field, preceded by an asterisk, as in *skep. A few of these roots are
actually Latin, Greek, or Germanic roots, not Proto-Indo-European.
It is important to remember that the PIE roots are all reconstructions, arrived
at by comparing words in the various languages assumed to have descended from
the mother tongue and applying the rules of sound change that grammarians have
developed. No direct traces of PIE exist. All of the roots are assumptions. In
his American Heritage Dictionary of Indo-European Roots Calvert Watkins uses the
word perhaps often, as is sometimes reflected in Bases via question marks. As in
any form of archaeology, etymological conclusions are often based on little hard
evidence, and conclusions and assumptions can change drastically with the
discovery of new evidence. Thus, there can be, and is, considerable
disagreement, not only about whether a given modern base descends from a certain
PIE root, but also about the form and semiotic content of that root (and at
times its very existence). In compiling the list of roots for the Relatives
field, I’ve taken what might be called a “loose constructionist” approach – that
is, since I am more interested in finding plausible unifying links than in
determining certain true cognates, I have assumed that if one source finds a
certain link plausible, even though others do not, I tend to include it.
In the Relatives field, due to font limitations, some special characters have
substitute symbols: Schwa is represented with the “at” sign, @; long vowels are
represented with capital letters. The superscript numerals used by Watkins to
distinguish among homographic PIE roots are here represented with regular
numerals.
The Relatives field, besides giving roots when available, suggests some of the
other elements that are related to the current base. The related elements are
marked with leading plus signs, which can be thought of as substitutes for
italics. The treatment does not pretend to be exhaustive, whatever exhaustive
might mean here. Of course, all bases with the same PIE root are assumed to be
related, so bases returned by a search in the Relatives field for a specific
root are assumedly related to one another to some degree, and in general the
more similar are the Senses fields, the more closely related are the bases. To
pursue these relationships further you should consult the family trees provided
by Watkins, either in his The American Heritage Dictionary of Indo-European
Roots or in his Indo-European appendices to the 1st, 3rd, and 4th editions of
the American Heritage Dictionary.
In the Relatives field, the references are usually to other bases. When prefixes
are listed – for instance, at pert1 there is “+[a4", the prefix [a4- in the
Prefixes table – that means that there is at least one word containing that base
that is an aphetic form of an earlier word with that prefix, as pert1 itself,
which is from ME apert [a4+pert. There are a number of words that have lost an
initial euphonious e4 particle, which was added in French and Spanish because
those languages tend to avoid the initial clusters <sc>, <sp>, <sq>, and <st>.
Additional instances of lost e4 can be found by filtering the Bases field to
bases with initial <sc>, <sp>, <sq>, or <st> and examining the returned bases to
exclude any that are not French or Spanish adaptations.
Acknowledgements
This is very much a derivative study, based on the primary
research of others: James Murray, Henry Bradley, Calvert Watkins, Julius Pokorny,
Eric Partridge, and the editors of The Barnhart Dictionary of Etymology.
Although the explications are my own work, much of the Bases table, especially
the Sense Links and much of the Relatives fields, gathers together findings from
many earlier studies.
The great majority of PIE roots are drawn from Calvert Watkins’ The American
Heritage Dictionary of Indo-European Roots (Boston: Houghton Mifflin, 1985). A
few are drawn from the slightly different and shortened versions of Watkins’
list given as appendices to the 1st, 3rd, and 4th editions of the American
Heritage Dictionary. Even fewer are drawn from The Barnhart Dictionary of
Etymology (H. H. Wilson, 1988). A few have come from Partridge’s Origins, fewer
from the interactive version of Julius Pokorny’s seminal Indogermanisches
Etymologisches Worterbook (Bern, 1959), available at Leiden University’s website
http://www.indo-european.nl/.
Those roots from Watkins are listed with no annotation beyond the root itself
and its presumed sense recorded in the Sense field, followed by a semicolon.
Those from Barnhart have the Pokorny page number, as in “(Pok 345)". Those from
Partridge are labeled “Partridge,” and those few from Leiden are labeled
“Leiden.” Some of those from the other sources involve roots listed by Watkins
but without the bases and words in question being listed as descendants in
Watkins, though they are so grouped in Partridge, even if he does not directly
mention any PIE root.
Introduction to the Prefixes Table
The Prefixes table contains the 245 prefixes found in the explication of the
129,029 words in Words. It is a rather long and eclectic list that includes,
among other things, some prefixes from very rare adoptions – such as the plural
markers ema- (emalangeni) and ma- (makota ) from Africa, and the noun markers
mi- (mikado) and sa1- (samurai) from Japanese. Such prefixes are obviously quite
peripheral to the English prefix system, but there they are in their adopted
words. Since the primary motive of explication is to highlight potential
unifying links in the English lexicon, it seems best to explicate to these alien
prefixes, rare and exotic though they may be.
In addition to the Prefix field, the Prefixes table includes the regular
Examples, Instances, and Comment fields. In the Comment field, since many
prefixes carry considerable semiotic content, their senses are given, in
quotation marks: [ultra-“Beyond, beyond the norm.” The Comment field also
contains a number of keywords: Contracts, Alters, Expands,
Varies, and Marks and
Forms, which show syntactic function. An especially important keyword is
Assimilation, which tags assimilated forms of several prefixes. The Comment
field also contains the keyword Search, followed by the syntax for filtering to
given assimilated forms – for example, at [ac1-, you find “Search [a/d+c+.”
Finally, the Comment field lists a number of source languages and country names:
Old English, German, Latin, French, Spanish,
Greek, Italian, Romance, Russian,
Japanese – and from Africa: African, Swaziland, Bemba,
Bantu, Zaire, Lesotho.
Definition of Prefixes
In his English Word Formation Hans Marchand uses a
rather restrictive definition of prefix (“Prefixes are bound morphemes which are preposed to free morphemes” [129]) and discusses only 65 (129-208). At the other
extreme, in his Origins Eric Partridge uses Webster’s much less restrictive
definition of prefix: “One or more letters or sylables combined or united with
the beginning of a word to modify its signification, as pre- in prefix,
con- in
conjure” (821). Partridge goes on to say that “Strictly, a prefix should consist
of either a preposition or adverb”, and he omits what he calls the “false
prefixes of science”, which he describes as not prefixes, but abbreviations.
Still, he lists more than 340 prefixes, including many homographic and variant
forms – for example, he lists seventeen different prefixes spelled <a>.
So the question of what a prefix is remains somewhat undecided. Dictionaries do
not agree on the distinction between prefixes and bases, especially those
usually bound bases called “combining forms.” For instance, though the AHD uses
electr+ as an example of a combining form (at “combining form”), in the main
word list it is labeled “prefix,” as are all other combining forms. On the other
hand, the RHUD and W3 both distinguish carefully between prefixes and combining
forms. At a different extreme the editors of Prefixes: and Other Word-Initial
Elements of English collapse the distinction completely, speaking only of
“word-initial elements” in their list of nearly 3,000 forms.
Elements signifying numerical values can illustrate the confusion: In W3 bi-
“two” is labeled a prefix, but tri- “three” is a combining form. RHUD labels
both as combining forms; AHD labels both as prefixes. For the sake of
simplicity, I treat all numerical elements as combining forms – that is, bases,
usually bound – and restrict prefixes essentially to prepositions (in2-,
ad-),
negatives (in1-, non-, un1-), adverbs (se-,
per1-), a few derivationals, (en1-
and be-), and even fewer inflectionals, such as those in samurai and
mikado.
There can be indecision whether the Romance euphonic <e>, as in escalate, is an
initial particle or a prefix. Arguing against the former is the fact that all
other particles are medial, not initial. Arguing against the latter is the fact
that euphonic <e> does not have any semiotic content. For now I explicate the
euphonic <e> as a particle, listed as e4 in the Bases table and tagged in the
Comment field as “Particle.”
Converted Prefixes
The practice of synecdoche and conversion in English has led
to several cases in which one-time prefixes have been converted to bases. Some
examples: The base com5 contracts communication (as in intercom), and
com7
contracts commissioned (as in noncom). Both convert the prefix [com- to a base.
In abo ab2+o]3 ab2 is clipped from
aborigine [ab1+orig+in]02+e]3. Selsyn
sel4+syn1 contracts “sel(f) + syn(chronous)”
[syn+chron+ous]. In cistron
cistr+on]08 the base cistr contracts the phrase “cis-trans test”. Since [cis-
and [trans- are both prefixes, the bound base cistr is another example of
conversion. The very common free base pro1 converts the prefix pro1- in
professional.
Often the prefix merges with part or all of the following base in the
conversion: In propane prop2+ane]3 the base
prop2 contracts propionic [pro2+pion2+ic]1. In
praetor prae+tor], pretor
pre+tor] and their derivatives
the prae+ and pre+ convert the Latin prefix pre- to a base and absorb the <e>
from a now lost Latin base meaning “go”. Combo comb3+o]3 contracts
combination,
with the prefix com-. Similar examples: comfy, commie,
comp3, condo. Pregnant pregn+ant]1 is ultimately from a Latin source that contains the prefix [prae-
plus the source of our modern base gn2 “born”. Related are slang terms such as
preggers, preggy, and preggo.
In the other direction, bases are sometimes converted to prefixes, as is
apparently the case in sovereign, which I explicate as
[sove+reign. The word has
evolved from ME soverain, influenced by reign . The ME soverain is from OF
soverain, from Vulgar Latin *superanus, which would be the Latin
super “above”
plus the adjective suffix -anus. Sovereign is a good example of folk etymology:
As the final syllable was respelled <reign>, the respelled Romance suffix -ain
was converted from suffix to base. It would also follow that the first syllable,
which started out as a base has converted to a prefix, a variant of the English
prefix super-.
The explication of prefixes that descend from the Latin preposition ad poses
some questions. At one extreme are those words in which the Latin preposition
had already become used as an assimilated prefix in Latin, words like accent and
accident. At the other extreme are words from French phrases with the contracted
French preposition a , with no double consonant: abase, abate,
abut. The former
I explicate as [a/d+; the latter as
[a3+. Problems arise with words between these
extremes, with words like acclimate, which developed the <cc> in French, and
accompany, which developed it in English. In such words there never was an
assimilation of ad-. But rather than positing a separate prefix ac-, I recognize
the power of analogy and folk etymology and treat such words as if they were
assimilated forms of ad-. Basically, if there is a double consonant, or its
equivalent, I'm inclined to explicate to [a/d+-, motivated by analogy. If there
is no double consonant, I'm inclined to explicate to the closely related
[a3+.
Introduction to the Suffixes Table
The Suffixes table contains the 1197 suffixes explicated out in Words.
In the Comment field there are a number of potentially useful search and filter
strings. The following are the more common for the country of origin: Greek,
Latin, GrecoRom (which equals GrecoRomance, which is Greek, Latin, and modern
Romance languages), Rom(ance), Italian, Spanish, Portuguese,
Germanic, German,
English, British, OE (Old English), Scandinavian,
Russian, Slavic, Hebrew,
Yiddish, Arabic, Semitic, Hindi, Japanese.
In what might be called a “processes” group there are the following: Contracts,
Expands, Varies, Marks, Vestige, Term(inative) and
Nonterm(inative), Coform,
Converges, Combines. Searching on Cf returns suffixes with close relationships
to other suffixes.
There are several possible strings dealing with grammar: noun, verb,
adj(ective),
past (tense), pres(ent participle) , part(iciple), and many others. Related are
agent, instrument, frequentative, diminutive,
augmentative, comparative,
superlative, pejorative, plural, etc.
In the register group are chemical, scientific, technical,
jocular, familiar,
informal, intimate.
I tend to explicate so as to recover as much as possible of potentially
motivating material from the words. One result of this is that sometimes
explicating to a base leads to suffixes that are quite exotic and peripheral,
such as the following from Hebrew: -o7, -oh, -os3, -ot6, -oth, -u5.
Nonterminative and Vestigial Suffixes
The notion of nonterminative suffixes may
at first seem odd. They tend to occur as nonterminative coforms in sets like {-abil]+,
-able]} and {-os2]+, -ous]}: availability, available; generosity,
generous.
Also, vestiges from earlier, primarily Latin, stems often are explicated as nonterminative suffixes. For instance, -it5, comes from Latin stems and is a
good example of evolutionary recycling: authoritative
auth+or]2+it]5+ative],
dietitian diet+it]5+ian], and puritan
pur5+it]5+an]1. The Suffixes table
contains 114 nonterminative and vestigial suffixes.
A Note on the Sequence +ic]1+al]1+ly]1
This common suffix sequence illustrates
some of the complexities involved in explication. In the Words table 401 words
contain the full sequence – as in canonically canon+ic]1+al]1+ly]1, which grows
from the series canon, canonic , canonical. 0000 But there are more than 480
words in Lexis that end <ically> but for which there are no shorter forms ending
in -al1. For instance, we have hydraulic and hydraulically , but no *hydraulical.
(The OED does list hydraulical, but labels it obsolete and shows only three
citations, stretching from 1664 to 1792.) There are also fifteen words in Lexis
that end <ically> but have no shorter form ending in -ic1. For instance, we have
farcical and farcically, but no *farcic. (Again, the OED lists
farcic, but
labels it obsolete and rare and lists only one citation, from 1763.) (There are
also more than 464 words that end with +al]1+ly]1, with no preceding +ic1, such
as morally.)
The missing forms with final -al1 are due to two features of our lexicon: First,
it is regular in English when adding -ly1 to a word ending in -ic1 to insert
<al> even if there is no intermediate form ending in -al1, as is the case with
hydraulic and hydraulically. And second, though sometimes the two adjectives
ending in -ic1 and in -ic1 plus -al1 have slightly different senses, such as
historic and historical, usually they are synonymous, in which case the longer
form disappears.
Rather than explicating all words that end <ically> consistently as
+ic]1+al]1+ly]1,
I use two expanded forms to accommodate those missing intermediate forms: -ical
and -ally, as in farcical farc+ical] and
hydraulically hydr+aul+ic]1+ally].
Also, since dictionaries do not always agree whether certain forms exist, and
since users are free to coin missing forms to fill emergent needs, I’ve decided
to set the universe of inclusion at the Words table itself – that is, if a form
is not included in Words, for the sake of this analysis, it does not exist.
* * * * * *
Available Text Files
The on-line version of the database has limited filtering
and sorting capabilities. If you need to do more elaborate filtering and sorting
– using Boolean operators, for instance – feel free to download the text files
of the tables and load them into your database or spreadsheet program. (Download
zipped tables: 1.54 MB)
The only requirement is that any public use of the Lexis database be publicly
acknowledged and documented.
A Work in Progress
It is important to realize that the Lexis database can only
be seen as very tentative. My hope is that later work will lead to more formal
and systematic ways of answering the question this work raises. Changes and
corrections are expected and encouraged, and the word tentative in the first
sentence of this paragraph is carefully and deliberately chosen.
In a project of this size, one that has stretched over so many years, a major
problem is maintaining consistency of analysis. If in using Lexis you discover
inconsistencies of any kind, please point them out to me at
englishspelling@charter.net. Beyond that, there is an immense potential for all
kinds of errors, from pesky typos to just plain mistaken explications. I would
appreciate hearing about any of those you might discover. As inconsistencies and
errors are caught and corrected, we plan to update the tables on the site. I
would also appreciate any suggestions or comments.
|
|
 |