|
From LEXICOGRAPHICA, International Annual of Lexicography,
Max Niemeyer Verlag, Tübingen, August 1992, pp. 307-317.
O CORPORA!
by
Thomas M. Paikeday
ABSTRACT: This article seeks to show that, for any kind of lexicographical
work, having a corpus of texts is as important as having a well-developed
body on which to base a study of its anatomy. A special-purpose dictionary
and standard general-purpose North American dictionaries are discussed
as examples of entries, definitions, and illustrative material that
could have benefited from a database capable of supplying a minimal
20 million citations.
The BBI Combinatory Dictionary of English, an admirable work, has been
the subject of many commendatory articles since its publication in 1986.
It seems only fair that we take a look here at how the BBI would have
been a better work if it had been based on a corpus or database of contemporary
English.
For this purpose, we will compare random samples such as the first
and last half-dozen entries of the BBI with a more complete treatment
of the same using a CD-ROM database of about 20 million words of edited
North American English of the mid-1980's -- "DBase85" for
short.
Such a check, especially of words that have newly come into the language,
should also throw light on the deficiencies of dictionaries such as
Webster's New World Dictionary (Third College Edition, 1988) which was
compiled using a database of fewer than a million citations of the manual
kind.
SPECIAL-PURPOSE DICTIONARY
Here are the first six entries of the BBI in their entirety:
aback adv. taken ~ (I was taken ~) ('I was startled')
abacus n. to operate, use an ~
abandon I n. (D; tr.) reckless, wild ~
abandon II v. (D; tr.) to ~ to (they ~ed us to our fate)
abbreviate v. (D; tr.) to ~ to (Esquire can be ~d to Esq.)
ABC n. as easy, simple as ~.
If we consider high-frequency words and phrases in DBase85, many relatively
more significant collocations than are entered in the BBI have been
missed, under the following headwords: A, aback, abandon n., abandon
v., abandoned adj., abandonment, abase, abashed, abate, abatement, abbreviate,
abbreviation, and ABC.
For comparison purposes, however, we will deal with only entries corresponding
to those in the BBI. In the draft entries given below, brief definitions
in roman precede the collocations in italics; as in the BBI, definitions
are kept to a minimum. For contrastive effect, the collocations of aback
are presented in sentences with minimal differences of meaning and structure.
A particular meaning is given in parentheses (as under ABC) when there
is a significant departure from the main definition. No attempt is made
to pigeonhole the collocations into grammatical and lexical categories
beyond the traditional parts of speech, pigeonholes being considered
less important than what is put into them.
aback adv.: taken aback startled: I was taken ~; I was taken ~ by the
announcement; I was taken ~ when I heard the announcement; I was taken
~ to hear that all the students had failed the test; I was taken ~ at
how many students had failed the test; [rarely] The announcement took
me ~.
[abacus is not one of our entries. Collocations of the type "operate,
use an abacus" are considered "free" combinations.]
abandon n. carefree manner: She danced, played, sang with ~; to spend
money with gay, gleeful, joyous, merry, mindless, wild ~; She partied
with such ~ that she missed the flight home.
abandon v. give up or desert: The captain ordered his men to ~ ship;
They ~ed it to its fate; He ~ed his naval career for one in the army;
to ~ oneself to despair, drinking, grief, pleasure.
abbreviate v. shorten: "Professor" is ~d to "Prof.";
"Professor" is used in ~d form as "Prof."; an ~d
version of the story; to ~ a career, program, schedule, term of office,
visit.
ABC n. the alphabet: as easy, simple as ~; to teach a child her ~s;
the ~s (= basics) of using a word processor.
Frequency counts from DBase85 could be produced to justify inclusion
of each of the entries and collocations above. However, let us be content
with the counts for the five collocations of the first entry aback.
These are listed above in a logical order based on meaning. The frequency
order, however, is different. The most frequent structure aback + by-clause
occurs 27 times in DBase85, followed by aback + when (8), aback ending
a clause or sentence (4), aback + to (3), aback + at (2), and aback
used in the active voice (1).
Here is corroborating evidence from a section of The Los Angeles Times
database comparable in size to DBase85: aback + by-clause (41 times);
aback + when (4), aback ending a clause or sentence (3), aback + to
(1), aback + at (1), and aback used in the active voice (3).
Now, here are the last six entries of the BBI:
zodiac n. the signs of the ~
zone I n. 1. to establish, set up a ~ 2. a climatic; frigid; temperate;
time; torrid ~ 3. a buffer; combat; communications; danger; demilitarized;
drop; neutral; no-parking; no-passing; occupation; postal; safety; school;
security; towaway; war ~ 4. an erogenous ~ (of the body)
zone II v. (d; tr.) to ~ as (they ~d the area as residential)
zoning n. exclusionary ~
zoo n. at, in a ~ (she works at the ~; wild animals are well cared for
in our ~)
zoom in v. (d; intr.) to ~ on (the camera ~ed in on the podium)
A collocational dictionary based on a good database would include also
entries such as zombie, zonk, and zoological. In fairness to the BBI,
however, we will confine our comparison to the six entries:
zodiac n. a diagram of astrological signs: the 12 signs of the ~; Money-making
is not in my ~; I'm a Libra, what's her ~ sign? She shares the same
~ sign as Marilyn Monroe.
zone n. a special area or region: a buffer ~ separating two warring
countries; a combat ~; a downtown commercial ~; a danger ~; the demilitarized
~ between two warring nations; the end ~ (behind either goal line in
North American football); a 12-mile coastal fishing ~; the mouth, the
behind, and such erogenous ~s of the body; Peace-loving nations wish
to declare their countries nuclear-free ~s; a military occupation ~;
a postal, residential, towaway ~; Tokyo and Seoul are in the same time
~, nine hours ahead of London; the Torrid, Temperate, and Frigid geographic
~s of the earth; in the twilight ~ (= grey area or borderline) of morality;
a war ~.
--v. form or divide into zones: A city is ~d into commercial, residential,
and industrial districts; land ~d (as) agricultural; land ~d for agricultural
use; agriculturally ~d land.
--zoning adj.: ~ approval; a ~ bylaw, change, classification, permit,
restriction, requirement, violation.
zoo n. 1 a place where wild animals are kept for display: a children's
petting ~; Bears should be kept in a ~; A fox at the ~ had rabies; The
animal died in the Bronx ~. 2 a crowded or noisy place: The festival
was like a ~; Large cities are human ~s; Our parliament sometimes becomes
a political ~.
zoom v. 1 move with a buzz or whoosh: The birds ~ed toward the plaza;
The Mirage jet ~ed up and out of sight. 2 move upward like zooming:
Her tennis ranking has ~ed from 100 to 10 in a year; Gas prices ~ed
after the oil embargo; The economy didn't ~ during the war. 3 focus
using a lens that gives quick close-ups: a TV picture that ~s from an
entire football field to the helmet of one of the players; The camera
~ed in on the star of the show.
--adj.: A ~ lens or zoomer is a photographic device for taking quick
close-up shots without having to adjust the focus; There is a ~ telescope
at the top of the tower.
--interj.: Zoom! She was an instant celebrity.
The style of the above presentation may not exactly suit the BBI's
scheme of things. But the question is, how best to serve the needs of
the users of the dictionary. Do they need distinctions such as the BBI's
seven types of lexical collocations? Are these valid at all when examined
linguistically or logically? If our collocational dictionary (like any
dictionary, by definition) is aimed at users (and we should include
not only foreigners, but the less educated natives such as freshmen)
who are unsure of which words collocate with an entry word or what prepositions
and other particles may be used with it, is it not better to give a
more or less complete rundown of the collocations based on frequency
of occurrence in a database and without the encumbrances of a theoretical
framework of demonstrably dubious value? Even if you are preparing a
dictionary of limited size, if you do not have a corpus on which to
base it, how are you going to select the most typical or frequent collocations
for listing?
Besides serving as a guide to frequency of occurrence, a good database
can also help a lexicographer avoid errors.
Take the BBI's mistrust, for example:
mistrust n. 1. to arouse ~ 2. deep, profound ~ 3. ~ towards
DBase85 shows that besides arousing mistrust, one can also create,
dispel, eliminate, reduce, remove, and sow mistrust. And mistrust can
not only be deep and profound, but also great, growing, mutual, and
widespread. There is also an atmosphere, current, feeling, legacy, sense
of mistrust. Mistrust extends to mistrust and fear, mistrust and hostility,
and suspicion and mistrust. Finally, there is mistrust among, between,
by people, and mistrust in, of, and over a person or thing.
However, according to DBase85, there is hardly any mistrust towards
anyone, the only No. 3 collocation that the BBI has recorded. Here are
the frequencies: mistrust of (16), between (9), in (2), and among, by,
and over one each.
Here is the corresponding evidence from The Los Angeles Times of the
same period as cited above under aback: mistrust of (16), between (5),
in (1), and among, by, and toward one each. Note the supplanting of
over by toward in the last instance. Some vindication of the BBI!
The almost strict correspondence in regard to collocations between
North American English edited in Los Angeles and in Toronto is almost
uncanny. Lexicography seems to achieve the condition of an exact science
in this particular, like saying that sugar is composed of 12 parts of
carbon, 22 parts of hydrogen and 11 parts of oxygen and mixed in with
pollutants like "mistrust towards."
GENERAL-PURPOSE DICTIONARIES
Turning now to English dictionaries for general use, here are a dozen
new words and expressions of the mid-1980's which newly revised works
such as The Third Barnhart Dictionary of New English (1990), Webster's
New World Dictionary (1988), and The Random House Dictionary (Unabridged,
1987) have missed. These entries should also have been included in the
updated editions of the other major American dictionaries, namely, Webster's
Ninth New Collegiate Dictionary (1990 update) and The American Heritage
Dictionary (1985 update). The Random House Webster's College Dictionary
(1991) has entered date rape but not the others.
The following selection of entries is based on frequency of occurrence
in DBase85 and in other databases such as Mead Data Central's Nexis,
The Los Angeles Times on CD-ROM, the Toronto Globe & Mail and its
online version, Info Globe, the Dialog database, etc. These were searched
for additional evidence where DBase85 was found to be inadequate. However,
I have contented myself with quoting one early (if not the earliest)
and one recent citation for each entry. This is without prejudice to
the fact that every newspaper database seems able to yield scores and,
in some cases, hundreds of citations attesting to the currency of these
words and expressions in North American English during the past 15 years.
Since 1977, for example, The Washington Post alone has 191 citations
(barring third-party retrieval errors) for allow as how, 61 for hone
in on, and 213 for comes / goes with the territory. [Editor: The O.E.D.
style is used for the citations, with the punctuation made more explicit
using two commas, the abbreviations p. and sect., and a colon to introduce
each quote].
1. allow as how: concede or grant
[As recorded in Harold Wentworth's American Dialect Dictionary, this
usage is old and originally dialectal, but seems to have gained currency
in general North American English.]
1975 SANFORD J. UNGAR Atlantic, Sept., pp. 29-30: Given his turn, Byrd
allows as how the affair "is an honor for a country boy from the
hills and hollows of West Virginia."
1988 JOHN CRUICKSHANK The Globe & Mail, May 14, p. A1: Asked on
a live radio interview why he had gone on the attack, Mr. Mulroney replied:
"I was asked what I thought of 22 Liberal members of Parliament
actively conspiring to overthrow their leader without their leader being
aware, and I allowed as how that wasn't a very good thing."
2. bench strength: strength or power in reserve
1980 MURRAY CHASS The New York Times, June 1, Sect. 5, p. 1: The Yankees
have shown this season the kind of bench strength required by a team
with championship aspirations.
1991 LINDA HOSSIE The Globe & Mail, Jan. 16, p. A13: If the fighting
does go on that long, the U.S. war effort will come under considerable
strain because of a lack of "bench strength," said Martin
Shadwick of the Centre for International Strategic Studies at York University.
3. corporate culture: the customs and values characteristic of a corporation
as constituting a subculture
1982 CAROL KRUCOFF The Washington Post, July 12, p. C5: "Every
organization has a culture," they [Terrence E. Deal and Allan A.
Kennedy] write in Corporate Cultures: The Rites and Rituals of Corporate
Life [Reading, Mass.: Addison-Wesley], "a cohesion of values, myths,
heroes and symbols that has come to mean a great deal to the people
who work there."
1985 ZUHAIR KASHMERI The Globe & Mail, Mar. 12, p. P1: The Hudson's
Bay Co. has such a strong and entrenched don't-rock-the-boat corporate
culture that it militates against more aggressive merchandising.
4. date rape: a raping of an acquaintance during a social date
1981 J. C. BARDEN The New York Times, June 1, Sect. B, p. 5: The clearinghouse,
a nonprofit organization supported by donations and the fees of individual
and institutional members, maintains a library on marital, cohabitant
and date rapes.
1987 DEBORAH WILSON The Globe & Mail, Sept. 14, p. A1: "Working
on a college campus, I hear so much about date rapes.... I know pornography
helps foster such actions," [Prof. James Weaver of the University
of Kentucky] said.
5. hone in on something: focus on a subject with skill and efficiency.
1977 JERRY KNIGHT The Washington Post, Oct. 18, p. D8: As he outlined
a series of detailed tax provisions not unlike ones administration sources
have attacked as "loopholes," Jones honed in on what he called
the "red flag" in the administration's tax plan -- The Three-Martini
Lunch.
1985 DAVID TOWNSON The Globe & Mail, Mar. 16, p. P7: Rick Groen
writes that the director "hones in on the grisly pragmatics of
the absurd." An absurdity that editors should home [sic] in on
is the use of such illogical expressions.... Perhaps one could conceivably
hone in on a pencil sharpener, but writers who consider using the phrase
should hone their skills with the aid of a dictionary....
1985 [heading] The Globe & Mail, Apr. 12, p. P8: Strategists hone
in on groups of voters.
6. moonballing: in tennis, the lobbing of a ball high in the air
1977 BARRY LORGE The Washington Post, May 31, p. D1: Solomon reverted
to the "moonballing" style he used in 1972, his first appearance
here, when he was referred to as a "danger to low flying sparrows."
He tried to hypnotize Higueras with an assortment of lobs, loopers and
slow balls, but the Spaniard responded with much of the same.
1991 NORA McCABE The Globe & Mail, Aug. 1, p. C12: When play finally
commenced at 6:50 p.m., four-time Canadian champion Helen Kelesi was
forced to resort to moonballing to overcome her own spotty play before
ousting leggy American Carrie Cunningham 7-6 (8-6), 6-3.
7. [parade] rain on someone's parade: spoil the fun that someone has
planned
1977 LAWRENCE FEINBERG The Washington Post, Apr. 3, p. B2: [picture
caption] RAINED-ON PARADE -- Spectators line curb along Constitution
Avenue despite rain yesterday for Washington's annual Cherry Blossom
Parade. [Whether this usage is literal or figurative may be arguable,
but it shows at least the origin of the phrase]
1985 WALTER McCUTCHAN (New OED Project Committee, University of Waterloo,
Canada) The Globe & Mail, May 15, p. P7: Richard [sic] Burchfield's
successor, Dr. Edmund Weiner, let it slip that the last word scheduled
for inclusion in the supplement to the OED is zzz, more accurately "z
recurring," which I'm sure you'll recognize as a common word, used
to represent sleep, or sleeping. My apologies for raining on your zyrious
parade and I hope I have been able to put this rumor to rest.
8. prohibitive favourite: a sure bet or winner
1979 [no author] Time, July 16, p. 33: Going into the men's finals,
Borg was a prohibitive favorite. His dramatic two-handed backhand and
awesome forehand were supercharged with top spin and landing with uncanny
accuracy.
1990 MELVIN DURSLAG The Los Angeles Times, Aug. 22, p. C2: Jimmy ventured
his opinions under wraps, avoiding odds and ends and sizing up games
in such gentle terms as "razor's edge," "slight favorite,"
"favorite," "heavy favorite," and "prohibitive
favorite."
9. the sizzle, not/and/without the steak: superficial qualities or
surface features as opposed to substance
1985 ALAN NIESTER The Globe & Mail, Apr. 5, p. E9: Yet by and large,
this is a band that sells the sizzle more than the steak.
1990 KAREN STABINER The Los Angeles Times, Sept. 2, p. 6: To borrow
the advertising adage, it's not the sizzle, it's the steak.
10. priorize: organize using an order of priority; prioritize
1985 ANDREW CAMPBELL The Globe & Mail, Jul. 8, p. B3: Managers
should priorize the features that must be present, those they want to
be present and those they would like.
1991 [no author] The Mississauga News, Mar. 17, p. 1: [MP Don Blenkarn]
added that "I think all of us and our constituents treat medicare
as a must, but we have to do some priorizing. . . . It is a question
of priorization of expenditures."
11. take a company private/public: buy out a public company using private
funds or go public with a company and sell its shares on the stock exchange
1980 PAMELA G. HOLLIE The New York Times, June 22, sect. 3, p. 9: In
1979, Hambrecht & Quist took six companies public, among them the
Triad Systems Corporation, which was offered at $15, rose to $27 at
yearend and traded at 19 1/4 bid last Friday....
1985 [Associated Press] The Globe & Mail, Dec. 10, p. B20: Wall
Street analysts said the brief announcement appeared to confirm that
Mr. Icahn wants to reduce the cash portion of the purchase price for
the approximately 16.9 million TWA shares he must buy to take the company
private.
12 [territory] comes/goes with the territory: comes/goes with the specified
role as part of it
1977 JEAN M. WHITE The Washington Post, Apr. 24, p. A13: "I think
that with great success, sniping and attacks seem to go with the territory,"
the author of "Roots" said yesterday.
1985 JAY SCOTT The Globe & Mail, Apr. 19, p. E1: Hughes is a celebrity
but requests for autographs don't come with the territory, and not even
Mel Gibson is mobbed in his homeland.
In addition to my main thesis about the need for corpora in lexicography,
I would like to make the following observations:
1. In this day and age, one million citations seems far below the poverty
level (which I would set at 20 million) for the evidentiary base of
a dictionary of contemporary English boasting "a word list of more
than 170,000 entries" (Webster's New World), unless it is a purely
derivative work based on the O.E.D. and the bounty of other commercial
dictionaries. Psychologist and statistician John B. Carroll has estimated
that 500 million word tokens of English text would be required for a
complete picture of the vocabulary (the most common word types). This
seems within the reach of even a private lexicographer of modest means.
Well-heeled dictionary publishers with vast resources of men, women,
and materials ought to be able to do a much better job than they seem
to be doing.
2. The above means that the "butterfly method" of manually
collecting citations has to yield to more dynamic ways of assessing
evidence using computerized databases, either commercial ones as available
to the public through library systems and information brokers or customized
ones that could include various styles and levels of usage and genres
of writing as in the pioneering work of the Brown Corpus.
3. A diskpack (or jukebox or minichanger) of CD-ROM's could hold all
the evidence a dictionary publisher needs. So much hardware, and software
to go with it, could be built up in a year with a five-figure dollar
investment -- a comfortable range to work with. Such a system could
generate up to 10 billion citations. These are not citations of the
cut-and-dried variety, as may be stored in a card file, but citations
that come up on demand like fish popping up from the ocean when the
fisher thinks of his favourite variety of fish and snaps his fingers.
Using Boolean operators, you would type in a command like "one-alarm
or two-alarm or three-alarm or four-alarm but not fire or fires or blaze
or blazes." A citation that responded to this command might read:
"You need a stomach of steel to hold down a three-alarm [meaning
very hot] Indian curry."
4. The dozen entries cited above are just the tip of the iceberg. Out
there in the vast ocean of billions of words available on tap as full-text
databases lurk hundreds of words and phrases in common use waiting to
be entered in dictionaries, as in expressions from "The Age of
Aquarius it isn't" to "extra virgin olive oil." Incidentally,
a CD-ROM-based system such as DBase85 offers the word-lover who browses
in it daily opportunities for serendipity, one way of discovering new
words, whereas browsing in your butterfly collection on 3" x 5"
or 4" x 6" cards is like browsing through a morgue. You don't
make discoveries but only identify corpses.
5. Word-hunters' periodicals and projects such as The Barnhart Dictionary
Companion and "Among the New Words" of American Speech that
now specialize in what, in my view, are "novelty items" have
to be supplemented by publications that supply fresh bread-and-butter
words and expressions which, in fact, are older in the language and
enjoy greater currency in everyday English but unfortunately are not
to be found in our dictionaries.
N O T E S
I wish to express my sincere thanks to my colleague Sol Steinmetz,
Executive Editor of the Random House Dictionaries, and professors Igor
Mel'cuk of the University of Montreal, Morton Benson of the University
of Pennsylvania, and Dr. Robert Ilson of University College, London,
for offering many useful comments that have helped me much in preparing
the final manuscript.
1. This writer was generous with his own meed of praise in a most exhaustive
review article in American Speech, vol. 64, no. 4 (Winter 1989), and
in his Acknowledgments to The Penguin Canadian Dictionary, Toronto,
1990.
2. Morton Benson (private communication) is quick to point out that
"the primacy of examples made up by compilers still stands unchallenged."
I agree with him on this. Databases should not be quoted as if they
were the word of God. Quotations from published texts are often too
wordy for lexicographical purposes. Unless one is compiling a dictionary
or vocabulary of the language of a particular text or person such as
the Bible or Shakespeare, quotations should be avoided. In dictionaries
of current English, it is more economical and efficient to fashion our
own illustrative phrases and sentences in succinct form based on the
data on hand.
3. "DBase85" is an inhouse term (not a trade name) used by
the author and is no cousin of dBase, Dbase, DBase, etc. which are trademarks
of commercial software. DBase85 is a conglomerate of manual files dating
from the sixties and assorted electronic files developed since the early
days of the microelectronic revolution, with "Info Globe on CD-ROM"
as its main component. The CD-ROM Info Globe, a world's first when published
in Toronto by The Globe & Mail, "Canada's National Newspaper,"
in 1986, is no more commercially available; it also requires a special
interface card for making use of it. This author finds it valuable as
a good representation of mid-1980's North American vocabulary. The word
crack (cocaine), for example, which made its appearance in 1986, occurs
as rock in several places in Info Globe; e.g.: "Once cocaine was
relatively hard, and always dangerous, to obtain. Today, thanks to a
new institution known as 'Rock House,' it is as easy as stopping at
a local supermarket.... A rock is a fingernail-sized chunk of purified,
uncut cocaine, to be smoked, or 'freebased' in user parlance, in a pipe.
A rock, in the ghettoes of L.A., sells for a mere $25 -- half the price
of 1982." (William Scobie, Los Angeles, Jan. 12, 1985, p. E21).
(The earliest citation in The Third Barnhart Dictionary of New English
for rock is dated a year later.) The most recent neologisms can be researched
in the online Info Globe available with the daily newspaper.
Info Globe is also a good mirror of contemporary North American usage,
although a lexicographer would want to use it with the Globe & Mail
style book by the side.
4. Private communication from editor-in-chief.
5. Morton Benson claims (private communication and p. ix of the BBI)
that he has entered only "major collocations" which are "those
recurrent, unpredictable combinations that readily come to mind."
But unpredictable to whom and readily coming to whose mind? For example,
under the headword wrinkle, in the sense of "innovation,"
the only unpredictable combination that has apparently come to the author's
mind is "the latest wrinkle." But DBase85 shows that "new"
is the most frequent modifier of "wrinkle," occurring five
times more frequently than "latest." No one who relies on
his own mind could have predicted this. Further, the database shows
that only three prepositions are used following "wrinkle,"
"on" being twice more frequent than the others which are "in"
and "to." Also "There's" is a recurrent feature
of structures in which "wrinkle" is used, as in "There's
a new wrinkle to the autonomy agreement." Only a good database
can help a lexicographer determine such facts of language use.
6. Of 8,174 citations for "operate" that DBase85 generated,
a check of the first 100 shows the verb having the following nouns as
direct object: [Editor: nouns arranged in alphabetical order of "operative"
nouns, not of modifiers]: aircraft (twice); business, clinic, company,
corporation, distillery, transition house, laboratory, DEW line, mill,
plant (2 each); project (2); information rack, restaurant, bus route,
holiday schedule, ferry service, satellite receiving station, stores,
foreign subsidiaries, system, remote-controlled tools, tours, and transmitter
(1).
"Use" had 31,910 citations, the first 20 of which had these
objects: attackers, deregulation, force, formula, incinerator, name
(2); pistol, [Boeing] 767, scare tactics, and trademark. Among other
things, these collocations show that "use" is not "a
synonym or near synonym" of "operate," as the BBI's use
of the comma in "operate, use" implies (BBI, xxxi). To operate
an abacus is to make it work, but to use one is to employ it; e.g.,
an elevator operator operates an elevator or makes it work, but the
others use or employ it or, more idiomatically, people take elevators.
The word abacus itself occurs ten times in DBase85, eight of them as
part of trade names and two in "Shop assistants make their sales
calculations on abacuses" and "the clack, clack, clack of
an abacus and a wild Cantonese whoop."
The omission of entries such as abacus in a collocational dictionary
makes room for more useful ones.
7. Benson objects that these words form only "free" combinations.
But the same could be said of "zoo," for which the BBI lists
"at" and "in" as collocations. "Zoological"
happens to collocate with "garden," "park," and
"society." These are not free collocations, at least no so
free as "conference," "director," "interest,"
"specimen," "training," "wonder," etc.
8. I find the ninefold distinction based on parts of speech used in
the first collocational dictionary, Kenkyusha's New Dictionary of English
Collocations by Prof. Senkichiro Katsumata (first published in Tokyo,
1939), a simpler and sounder system when you consider the needs of a
dictionary user. This work, however, is somewhat flawed in regard to
idiom and is badly in need of revision.
9. Errors are sure to occur in the first printing of any work of the
complexity of a dictionary. And this lexicographer believes it is not
fair to pick and choose errors from dictionaries since they are homogeneous
wholes. Examining randomly selected sample sections is the only sensible
way of doing it. But if I were to pitch on a half-column of the BBI
with eyes closed, this article would have taken up more space than the
editor might be willing to allow it. So, I will examine one two-line
entry, the point of the exercise being that some of the errors in the
dictionary could have been avoided if the lexicographers had used a
database.
10. I am not unaware of the many good books and periodicals dealing
with neologisms available on the market. Some of the dozen entries I
have used as examples may well have found their way into some of these
publications before or after this article was written. Also, a couple
of items are to be found in Robert Chapman's New Dictionary of American
Slang. But the question I am asking here is simply that, given the evidence,
how is it that these new words were passed over by the major dictionaries.
Most of these words, however, as well as scores of others such as cocooning,
deep pocket, downstream, heli-ski, hypertext, ice (cocaine), infomercial,
lifestyle advertising, loose cannon, negative option, people meter,
pinstriper, risk arbitrage, spin doctor, stonewashed, surrogate mother,
and (computer) virus, have been entered in The Penguin Canadian Dictionary.
11. However, tapping commercial databases for lexicographical citations
is an expensive proposition. The Globe & Mail online charges $180
an hour. If you are familiar with the strategies, a short session with
The Washington Post that nets ten citations could cost you up to $40.
An information broker would add a surcharge of at least 35%.
12. Traditional lexicographers, some of them still wedded to their
manual typewriters, have been kicking against the goad for over a decade.
The reactions to a paper ("Language Analysis and Lexicography by
Microcomputer") that I presented at the annual meeting of the American
Dialect Society in 1981 varied as follows: The dean of American lexicographers,
Clarence L. Barnhart, when I visited him on that occasion, showed me
his stack of Univac sorting cards from the fifties as if they were the
state of the art in citation gathering; The Barnhart Dictionary Companion,
started in 1985, is now a good example of a combination of the old and
the new methods of data collection. Robert W. Burchfield, Editor of
the O.E.D. Supplements, among other things, wrote, "Come off it!"
(a feeling echoed by Sidney Landau in his landmark book on lexicography),
although the very next year he had begun subcribing to Lexis, Nexis,
and Dialog. Bob's challenging "hope that from your millions of
words on line or standing in reserve demonstrably better dictionaries
will emerge" has now been realized, at least by the New O.E.D.,
it being physically impossible to do better than the O.E.D. John Sinclair
of the University of Birmingham, on the other hand, was more forward-looking
in 1981. He wrote: "Your technology must be way ahead of us....
I am really intrigued, and if your paper (please send) fleshes out your
bare statements, I shall arrive at your doorstep pretty soon."
I have regularly sent John almost every lexicographical idea that has
crossed my mind and I daresay he has made good use of them.
13. Two years before The Barnhart Dictionary Companion came out, I
had sent out a proposal for something similar using databases such as
Info Globe, Nexis, and Dialog. I think it is high time the BDC faced
some competition. I would suggest a quarterly with the generic title
"Dictionary Companion to Current English." I also believe
there is room in some linguistic journal for a column headed "Among
the Older Words."
[2 June 1992]
Go To Top Of Page
© 2002, Thomas M. Paikeday
|