ARTICLES

O Corpora!

Plagiarism, Hardcore and Softcore

Vagina vs. Vulva

 

 
 
  
 
 
 

 

 

 

 

From LEXICOGRAPHICA, International Annual of Lexicography, Max Niemeyer Verlag, Tübingen, August 1992, pp. 307-317.


O CORPORA!

by

Thomas M. Paikeday



ABSTRACT: This article seeks to show that, for any kind of lexicographical work, having a corpus of texts is as important as having a well-developed body on which to base a study of its anatomy. A special-purpose dictionary and standard general-purpose North American dictionaries are discussed as examples of entries, definitions, and illustrative material that could have benefited from a database capable of supplying a minimal 20 million citations.


The BBI Combinatory Dictionary of English, an admirable work, has been the subject of many commendatory articles since its publication in 1986. It seems only fair that we take a look here at how the BBI would have been a better work if it had been based on a corpus or database of contemporary English.

For this purpose, we will compare random samples such as the first and last half-dozen entries of the BBI with a more complete treatment of the same using a CD-ROM database of about 20 million words of edited North American English of the mid-1980's -- "DBase85" for short.

Such a check, especially of words that have newly come into the language, should also throw light on the deficiencies of dictionaries such as Webster's New World Dictionary (Third College Edition, 1988) which was compiled using a database of fewer than a million citations of the manual kind.

SPECIAL-PURPOSE DICTIONARY

Here are the first six entries of the BBI in their entirety:

aback adv. taken ~ (I was taken ~) ('I was startled')
abacus n. to operate, use an ~
abandon I n. (D; tr.) reckless, wild ~
abandon II v. (D; tr.) to ~ to (they ~ed us to our fate)
abbreviate v. (D; tr.) to ~ to (Esquire can be ~d to Esq.)
ABC n. as easy, simple as ~.

If we consider high-frequency words and phrases in DBase85, many relatively more significant collocations than are entered in the BBI have been missed, under the following headwords: A, aback, abandon n., abandon v., abandoned adj., abandonment, abase, abashed, abate, abatement, abbreviate, abbreviation, and ABC.

For comparison purposes, however, we will deal with only entries corresponding to those in the BBI. In the draft entries given below, brief definitions in roman precede the collocations in italics; as in the BBI, definitions are kept to a minimum. For contrastive effect, the collocations of aback are presented in sentences with minimal differences of meaning and structure. A particular meaning is given in parentheses (as under ABC) when there is a significant departure from the main definition. No attempt is made to pigeonhole the collocations into grammatical and lexical categories beyond the traditional parts of speech, pigeonholes being considered less important than what is put into them.

aback adv.: taken aback startled: I was taken ~; I was taken ~ by the announcement; I was taken ~ when I heard the announcement; I was taken ~ to hear that all the students had failed the test; I was taken ~ at how many students had failed the test; [rarely] The announcement took me ~.
[abacus is not one of our entries. Collocations of the type "operate, use an abacus" are considered "free" combinations.]
abandon n. carefree manner: She danced, played, sang with ~; to spend money with gay, gleeful, joyous, merry, mindless, wild ~; She partied with such ~ that she missed the flight home.
abandon v. give up or desert: The captain ordered his men to ~ ship; They ~ed it to its fate; He ~ed his naval career for one in the army; to ~ oneself to despair, drinking, grief, pleasure.
abbreviate v. shorten: "Professor" is ~d to "Prof."; "Professor" is used in ~d form as "Prof."; an ~d version of the story; to ~ a career, program, schedule, term of office, visit.
ABC n. the alphabet: as easy, simple as ~; to teach a child her ~s; the ~s (= basics) of using a word processor.

Frequency counts from DBase85 could be produced to justify inclusion of each of the entries and collocations above. However, let us be content with the counts for the five collocations of the first entry aback. These are listed above in a logical order based on meaning. The frequency order, however, is different. The most frequent structure aback + by-clause occurs 27 times in DBase85, followed by aback + when (8), aback ending a clause or sentence (4), aback + to (3), aback + at (2), and aback used in the active voice (1).

Here is corroborating evidence from a section of The Los Angeles Times database comparable in size to DBase85: aback + by-clause (41 times); aback + when (4), aback ending a clause or sentence (3), aback + to (1), aback + at (1), and aback used in the active voice (3).

Now, here are the last six entries of the BBI:

zodiac n. the signs of the ~
zone I n. 1. to establish, set up a ~ 2. a climatic; frigid; temperate; time; torrid ~ 3. a buffer; combat; communications; danger; demilitarized; drop; neutral; no-parking; no-passing; occupation; postal; safety; school; security; towaway; war ~ 4. an erogenous ~ (of the body)
zone II v. (d; tr.) to ~ as (they ~d the area as residential)
zoning n. exclusionary ~
zoo n. at, in a ~ (she works at the ~; wild animals are well cared for in our ~)
zoom in v. (d; intr.) to ~ on (the camera ~ed in on the podium)

A collocational dictionary based on a good database would include also entries such as zombie, zonk, and zoological. In fairness to the BBI, however, we will confine our comparison to the six entries:

zodiac n. a diagram of astrological signs: the 12 signs of the ~; Money-making is not in my ~; I'm a Libra, what's her ~ sign? She shares the same ~ sign as Marilyn Monroe.

zone n. a special area or region: a buffer ~ separating two warring countries; a combat ~; a downtown commercial ~; a danger ~; the demilitarized ~ between two warring nations; the end ~ (behind either goal line in North American football); a 12-mile coastal fishing ~; the mouth, the behind, and such erogenous ~s of the body; Peace-loving nations wish to declare their countries nuclear-free ~s; a military occupation ~; a postal, residential, towaway ~; Tokyo and Seoul are in the same time ~, nine hours ahead of London; the Torrid, Temperate, and Frigid geographic ~s of the earth; in the twilight ~ (= grey area or borderline) of morality; a war ~.
--v. form or divide into zones: A city is ~d into commercial, residential, and industrial districts; land ~d (as) agricultural; land ~d for agricultural use; agriculturally ~d land.
--zoning adj.: ~ approval; a ~ bylaw, change, classification, permit, restriction, requirement, violation.
zoo n. 1 a place where wild animals are kept for display: a children's petting ~; Bears should be kept in a ~; A fox at the ~ had rabies; The animal died in the Bronx ~. 2 a crowded or noisy place: The festival was like a ~; Large cities are human ~s; Our parliament sometimes becomes a political ~.
zoom v. 1 move with a buzz or whoosh: The birds ~ed toward the plaza; The Mirage jet ~ed up and out of sight. 2 move upward like zooming: Her tennis ranking has ~ed from 100 to 10 in a year; Gas prices ~ed after the oil embargo; The economy didn't ~ during the war. 3 focus using a lens that gives quick close-ups: a TV picture that ~s from an entire football field to the helmet of one of the players; The camera ~ed in on the star of the show.
--adj.: A ~ lens or zoomer is a photographic device for taking quick close-up shots without having to adjust the focus; There is a ~ telescope at the top of the tower.
--interj.: Zoom! She was an instant celebrity.

The style of the above presentation may not exactly suit the BBI's scheme of things. But the question is, how best to serve the needs of the users of the dictionary. Do they need distinctions such as the BBI's seven types of lexical collocations? Are these valid at all when examined linguistically or logically? If our collocational dictionary (like any dictionary, by definition) is aimed at users (and we should include not only foreigners, but the less educated natives such as freshmen) who are unsure of which words collocate with an entry word or what prepositions and other particles may be used with it, is it not better to give a more or less complete rundown of the collocations based on frequency of occurrence in a database and without the encumbrances of a theoretical framework of demonstrably dubious value? Even if you are preparing a dictionary of limited size, if you do not have a corpus on which to base it, how are you going to select the most typical or frequent collocations for listing?

Besides serving as a guide to frequency of occurrence, a good database can also help a lexicographer avoid errors.

Take the BBI's mistrust, for example:

mistrust n. 1. to arouse ~ 2. deep, profound ~ 3. ~ towards

DBase85 shows that besides arousing mistrust, one can also create, dispel, eliminate, reduce, remove, and sow mistrust. And mistrust can not only be deep and profound, but also great, growing, mutual, and widespread. There is also an atmosphere, current, feeling, legacy, sense of mistrust. Mistrust extends to mistrust and fear, mistrust and hostility, and suspicion and mistrust. Finally, there is mistrust among, between, by people, and mistrust in, of, and over a person or thing.

However, according to DBase85, there is hardly any mistrust towards anyone, the only No. 3 collocation that the BBI has recorded. Here are the frequencies: mistrust of (16), between (9), in (2), and among, by, and over one each.

Here is the corresponding evidence from The Los Angeles Times of the same period as cited above under aback: mistrust of (16), between (5), in (1), and among, by, and toward one each. Note the supplanting of over by toward in the last instance. Some vindication of the BBI!

The almost strict correspondence in regard to collocations between North American English edited in Los Angeles and in Toronto is almost uncanny. Lexicography seems to achieve the condition of an exact science in this particular, like saying that sugar is composed of 12 parts of carbon, 22 parts of hydrogen and 11 parts of oxygen and mixed in with pollutants like "mistrust towards."


GENERAL-PURPOSE DICTIONARIES

Turning now to English dictionaries for general use, here are a dozen new words and expressions of the mid-1980's which newly revised works such as The Third Barnhart Dictionary of New English (1990), Webster's New World Dictionary (1988), and The Random House Dictionary (Unabridged, 1987) have missed. These entries should also have been included in the updated editions of the other major American dictionaries, namely, Webster's Ninth New Collegiate Dictionary (1990 update) and The American Heritage Dictionary (1985 update). The Random House Webster's College Dictionary (1991) has entered date rape but not the others.

The following selection of entries is based on frequency of occurrence in DBase85 and in other databases such as Mead Data Central's Nexis, The Los Angeles Times on CD-ROM, the Toronto Globe & Mail and its online version, Info Globe, the Dialog database, etc. These were searched for additional evidence where DBase85 was found to be inadequate. However, I have contented myself with quoting one early (if not the earliest) and one recent citation for each entry. This is without prejudice to the fact that every newspaper database seems able to yield scores and, in some cases, hundreds of citations attesting to the currency of these words and expressions in North American English during the past 15 years. Since 1977, for example, The Washington Post alone has 191 citations (barring third-party retrieval errors) for allow as how, 61 for hone in on, and 213 for comes / goes with the territory. [Editor: The O.E.D. style is used for the citations, with the punctuation made more explicit using two commas, the abbreviations p. and sect., and a colon to introduce each quote].

1. allow as how: concede or grant

[As recorded in Harold Wentworth's American Dialect Dictionary, this usage is old and originally dialectal, but seems to have gained currency in general North American English.]

1975 SANFORD J. UNGAR Atlantic, Sept., pp. 29-30: Given his turn, Byrd allows as how the affair "is an honor for a country boy from the hills and hollows of West Virginia."

1988 JOHN CRUICKSHANK The Globe & Mail, May 14, p. A1: Asked on a live radio interview why he had gone on the attack, Mr. Mulroney replied: "I was asked what I thought of 22 Liberal members of Parliament actively conspiring to overthrow their leader without their leader being aware, and I allowed as how that wasn't a very good thing."

2. bench strength: strength or power in reserve

1980 MURRAY CHASS The New York Times, June 1, Sect. 5, p. 1: The Yankees have shown this season the kind of bench strength required by a team with championship aspirations.

1991 LINDA HOSSIE The Globe & Mail, Jan. 16, p. A13: If the fighting does go on that long, the U.S. war effort will come under considerable strain because of a lack of "bench strength," said Martin Shadwick of the Centre for International Strategic Studies at York University.

3. corporate culture: the customs and values characteristic of a corporation as constituting a subculture

1982 CAROL KRUCOFF The Washington Post, July 12, p. C5: "Every organization has a culture," they [Terrence E. Deal and Allan A. Kennedy] write in Corporate Cultures: The Rites and Rituals of Corporate Life [Reading, Mass.: Addison-Wesley], "a cohesion of values, myths, heroes and symbols that has come to mean a great deal to the people who work there."

1985 ZUHAIR KASHMERI The Globe & Mail, Mar. 12, p. P1: The Hudson's Bay Co. has such a strong and entrenched don't-rock-the-boat corporate culture that it militates against more aggressive merchandising.

4. date rape: a raping of an acquaintance during a social date

1981 J. C. BARDEN The New York Times, June 1, Sect. B, p. 5: The clearinghouse, a nonprofit organization supported by donations and the fees of individual and institutional members, maintains a library on marital, cohabitant and date rapes.

1987 DEBORAH WILSON The Globe & Mail, Sept. 14, p. A1: "Working on a college campus, I hear so much about date rapes.... I know pornography helps foster such actions," [Prof. James Weaver of the University of Kentucky] said.

5. hone in on something: focus on a subject with skill and efficiency.

1977 JERRY KNIGHT The Washington Post, Oct. 18, p. D8: As he outlined a series of detailed tax provisions not unlike ones administration sources have attacked as "loopholes," Jones honed in on what he called the "red flag" in the administration's tax plan -- The Three-Martini Lunch.

1985 DAVID TOWNSON The Globe & Mail, Mar. 16, p. P7: Rick Groen writes that the director "hones in on the grisly pragmatics of the absurd." An absurdity that editors should home [sic] in on is the use of such illogical expressions.... Perhaps one could conceivably hone in on a pencil sharpener, but writers who consider using the phrase should hone their skills with the aid of a dictionary....

1985 [heading] The Globe & Mail, Apr. 12, p. P8: Strategists hone in on groups of voters.

6. moonballing: in tennis, the lobbing of a ball high in the air

1977 BARRY LORGE The Washington Post, May 31, p. D1: Solomon reverted to the "moonballing" style he used in 1972, his first appearance here, when he was referred to as a "danger to low flying sparrows." He tried to hypnotize Higueras with an assortment of lobs, loopers and slow balls, but the Spaniard responded with much of the same.

1991 NORA McCABE The Globe & Mail, Aug. 1, p. C12: When play finally commenced at 6:50 p.m., four-time Canadian champion Helen Kelesi was forced to resort to moonballing to overcome her own spotty play before ousting leggy American Carrie Cunningham 7-6 (8-6), 6-3.

7. [parade] rain on someone's parade: spoil the fun that someone has planned

1977 LAWRENCE FEINBERG The Washington Post, Apr. 3, p. B2: [picture caption] RAINED-ON PARADE -- Spectators line curb along Constitution Avenue despite rain yesterday for Washington's annual Cherry Blossom Parade. [Whether this usage is literal or figurative may be arguable, but it shows at least the origin of the phrase]

1985 WALTER McCUTCHAN (New OED Project Committee, University of Waterloo, Canada) The Globe & Mail, May 15, p. P7: Richard [sic] Burchfield's successor, Dr. Edmund Weiner, let it slip that the last word scheduled for inclusion in the supplement to the OED is zzz, more accurately "z recurring," which I'm sure you'll recognize as a common word, used to represent sleep, or sleeping. My apologies for raining on your zyrious parade and I hope I have been able to put this rumor to rest.

8. prohibitive favourite: a sure bet or winner

1979 [no author] Time, July 16, p. 33: Going into the men's finals, Borg was a prohibitive favorite. His dramatic two-handed backhand and awesome forehand were supercharged with top spin and landing with uncanny accuracy.

1990 MELVIN DURSLAG The Los Angeles Times, Aug. 22, p. C2: Jimmy ventured his opinions under wraps, avoiding odds and ends and sizing up games in such gentle terms as "razor's edge," "slight favorite," "favorite," "heavy favorite," and "prohibitive favorite."

9. the sizzle, not/and/without the steak: superficial qualities or surface features as opposed to substance
1985 ALAN NIESTER The Globe & Mail, Apr. 5, p. E9: Yet by and large, this is a band that sells the sizzle more than the steak.


1990 KAREN STABINER The Los Angeles Times, Sept. 2, p. 6: To borrow the advertising adage, it's not the sizzle, it's the steak.

10. priorize: organize using an order of priority; prioritize

1985 ANDREW CAMPBELL The Globe & Mail, Jul. 8, p. B3: Managers should priorize the features that must be present, those they want to be present and those they would like.

1991 [no author] The Mississauga News, Mar. 17, p. 1: [MP Don Blenkarn] added that "I think all of us and our constituents treat medicare as a must, but we have to do some priorizing. . . . It is a question of priorization of expenditures."

11. take a company private/public: buy out a public company using private funds or go public with a company and sell its shares on the stock exchange

1980 PAMELA G. HOLLIE The New York Times, June 22, sect. 3, p. 9: In 1979, Hambrecht & Quist took six companies public, among them the Triad Systems Corporation, which was offered at $15, rose to $27 at yearend and traded at 19 1/4 bid last Friday....

1985 [Associated Press] The Globe & Mail, Dec. 10, p. B20: Wall Street analysts said the brief announcement appeared to confirm that Mr. Icahn wants to reduce the cash portion of the purchase price for the approximately 16.9 million TWA shares he must buy to take the company private.

12 [territory] comes/goes with the territory: comes/goes with the specified role as part of it

1977 JEAN M. WHITE The Washington Post, Apr. 24, p. A13: "I think that with great success, sniping and attacks seem to go with the territory," the author of "Roots" said yesterday.
1985 JAY SCOTT The Globe & Mail, Apr. 19, p. E1: Hughes is a celebrity but requests for autographs don't come with the territory, and not even Mel Gibson is mobbed in his homeland.

In addition to my main thesis about the need for corpora in lexicography, I would like to make the following observations:

1. In this day and age, one million citations seems far below the poverty level (which I would set at 20 million) for the evidentiary base of a dictionary of contemporary English boasting "a word list of more than 170,000 entries" (Webster's New World), unless it is a purely derivative work based on the O.E.D. and the bounty of other commercial dictionaries. Psychologist and statistician John B. Carroll has estimated that 500 million word tokens of English text would be required for a complete picture of the vocabulary (the most common word types). This seems within the reach of even a private lexicographer of modest means. Well-heeled dictionary publishers with vast resources of men, women, and materials ought to be able to do a much better job than they seem to be doing.

2. The above means that the "butterfly method" of manually collecting citations has to yield to more dynamic ways of assessing evidence using computerized databases, either commercial ones as available to the public through library systems and information brokers or customized ones that could include various styles and levels of usage and genres of writing as in the pioneering work of the Brown Corpus.

3. A diskpack (or jukebox or minichanger) of CD-ROM's could hold all the evidence a dictionary publisher needs. So much hardware, and software to go with it, could be built up in a year with a five-figure dollar investment -- a comfortable range to work with. Such a system could generate up to 10 billion citations. These are not citations of the cut-and-dried variety, as may be stored in a card file, but citations that come up on demand like fish popping up from the ocean when the fisher thinks of his favourite variety of fish and snaps his fingers. Using Boolean operators, you would type in a command like "one-alarm or two-alarm or three-alarm or four-alarm but not fire or fires or blaze or blazes." A citation that responded to this command might read: "You need a stomach of steel to hold down a three-alarm [meaning very hot] Indian curry."

4. The dozen entries cited above are just the tip of the iceberg. Out there in the vast ocean of billions of words available on tap as full-text databases lurk hundreds of words and phrases in common use waiting to be entered in dictionaries, as in expressions from "The Age of Aquarius it isn't" to "extra virgin olive oil." Incidentally, a CD-ROM-based system such as DBase85 offers the word-lover who browses in it daily opportunities for serendipity, one way of discovering new words, whereas browsing in your butterfly collection on 3" x 5" or 4" x 6" cards is like browsing through a morgue. You don't make discoveries but only identify corpses.

5. Word-hunters' periodicals and projects such as The Barnhart Dictionary Companion and "Among the New Words" of American Speech that now specialize in what, in my view, are "novelty items" have to be supplemented by publications that supply fresh bread-and-butter words and expressions which, in fact, are older in the language and enjoy greater currency in everyday English but unfortunately are not to be found in our dictionaries.


N O T E S

I wish to express my sincere thanks to my colleague Sol Steinmetz, Executive Editor of the Random House Dictionaries, and professors Igor Mel'cuk of the University of Montreal, Morton Benson of the University of Pennsylvania, and Dr. Robert Ilson of University College, London, for offering many useful comments that have helped me much in preparing the final manuscript.

1. This writer was generous with his own meed of praise in a most exhaustive review article in American Speech, vol. 64, no. 4 (Winter 1989), and in his Acknowledgments to The Penguin Canadian Dictionary, Toronto, 1990.

2. Morton Benson (private communication) is quick to point out that "the primacy of examples made up by compilers still stands unchallenged." I agree with him on this. Databases should not be quoted as if they were the word of God. Quotations from published texts are often too wordy for lexicographical purposes. Unless one is compiling a dictionary or vocabulary of the language of a particular text or person such as the Bible or Shakespeare, quotations should be avoided. In dictionaries of current English, it is more economical and efficient to fashion our own illustrative phrases and sentences in succinct form based on the data on hand.

3. "DBase85" is an inhouse term (not a trade name) used by the author and is no cousin of dBase, Dbase, DBase, etc. which are trademarks of commercial software. DBase85 is a conglomerate of manual files dating from the sixties and assorted electronic files developed since the early days of the microelectronic revolution, with "Info Globe on CD-ROM" as its main component. The CD-ROM Info Globe, a world's first when published in Toronto by The Globe & Mail, "Canada's National Newspaper," in 1986, is no more commercially available; it also requires a special interface card for making use of it. This author finds it valuable as a good representation of mid-1980's North American vocabulary. The word crack (cocaine), for example, which made its appearance in 1986, occurs as rock in several places in Info Globe; e.g.: "Once cocaine was relatively hard, and always dangerous, to obtain. Today, thanks to a new institution known as 'Rock House,' it is as easy as stopping at a local supermarket.... A rock is a fingernail-sized chunk of purified, uncut cocaine, to be smoked, or 'freebased' in user parlance, in a pipe. A rock, in the ghettoes of L.A., sells for a mere $25 -- half the price of 1982." (William Scobie, Los Angeles, Jan. 12, 1985, p. E21). (The earliest citation in The Third Barnhart Dictionary of New English for rock is dated a year later.) The most recent neologisms can be researched in the online Info Globe available with the daily newspaper.

Info Globe is also a good mirror of contemporary North American usage, although a lexicographer would want to use it with the Globe & Mail style book by the side.

4. Private communication from editor-in-chief.

5. Morton Benson claims (private communication and p. ix of the BBI) that he has entered only "major collocations" which are "those recurrent, unpredictable combinations that readily come to mind." But unpredictable to whom and readily coming to whose mind? For example, under the headword wrinkle, in the sense of "innovation," the only unpredictable combination that has apparently come to the author's mind is "the latest wrinkle." But DBase85 shows that "new" is the most frequent modifier of "wrinkle," occurring five times more frequently than "latest." No one who relies on his own mind could have predicted this. Further, the database shows that only three prepositions are used following "wrinkle," "on" being twice more frequent than the others which are "in" and "to." Also "There's" is a recurrent feature of structures in which "wrinkle" is used, as in "There's a new wrinkle to the autonomy agreement." Only a good database can help a lexicographer determine such facts of language use.

6. Of 8,174 citations for "operate" that DBase85 generated, a check of the first 100 shows the verb having the following nouns as direct object: [Editor: nouns arranged in alphabetical order of "operative" nouns, not of modifiers]: aircraft (twice); business, clinic, company, corporation, distillery, transition house, laboratory, DEW line, mill, plant (2 each); project (2); information rack, restaurant, bus route, holiday schedule, ferry service, satellite receiving station, stores, foreign subsidiaries, system, remote-controlled tools, tours, and transmitter (1).

"Use" had 31,910 citations, the first 20 of which had these objects: attackers, deregulation, force, formula, incinerator, name (2); pistol, [Boeing] 767, scare tactics, and trademark. Among other things, these collocations show that "use" is not "a synonym or near synonym" of "operate," as the BBI's use of the comma in "operate, use" implies (BBI, xxxi). To operate an abacus is to make it work, but to use one is to employ it; e.g., an elevator operator operates an elevator or makes it work, but the others use or employ it or, more idiomatically, people take elevators.

The word abacus itself occurs ten times in DBase85, eight of them as part of trade names and two in "Shop assistants make their sales calculations on abacuses" and "the clack, clack, clack of an abacus and a wild Cantonese whoop."

The omission of entries such as abacus in a collocational dictionary makes room for more useful ones.

7. Benson objects that these words form only "free" combinations. But the same could be said of "zoo," for which the BBI lists "at" and "in" as collocations. "Zoological" happens to collocate with "garden," "park," and "society." These are not free collocations, at least no so free as "conference," "director," "interest," "specimen," "training," "wonder," etc.

8. I find the ninefold distinction based on parts of speech used in the first collocational dictionary, Kenkyusha's New Dictionary of English Collocations by Prof. Senkichiro Katsumata (first published in Tokyo, 1939), a simpler and sounder system when you consider the needs of a dictionary user. This work, however, is somewhat flawed in regard to idiom and is badly in need of revision.

9. Errors are sure to occur in the first printing of any work of the complexity of a dictionary. And this lexicographer believes it is not fair to pick and choose errors from dictionaries since they are homogeneous wholes. Examining randomly selected sample sections is the only sensible way of doing it. But if I were to pitch on a half-column of the BBI with eyes closed, this article would have taken up more space than the editor might be willing to allow it. So, I will examine one two-line entry, the point of the exercise being that some of the errors in the dictionary could have been avoided if the lexicographers had used a database.

10. I am not unaware of the many good books and periodicals dealing with neologisms available on the market. Some of the dozen entries I have used as examples may well have found their way into some of these publications before or after this article was written. Also, a couple of items are to be found in Robert Chapman's New Dictionary of American Slang. But the question I am asking here is simply that, given the evidence, how is it that these new words were passed over by the major dictionaries. Most of these words, however, as well as scores of others such as cocooning, deep pocket, downstream, heli-ski, hypertext, ice (cocaine), infomercial, lifestyle advertising, loose cannon, negative option, people meter, pinstriper, risk arbitrage, spin doctor, stonewashed, surrogate mother, and (computer) virus, have been entered in The Penguin Canadian Dictionary.

11. However, tapping commercial databases for lexicographical citations is an expensive proposition. The Globe & Mail online charges $180 an hour. If you are familiar with the strategies, a short session with The Washington Post that nets ten citations could cost you up to $40. An information broker would add a surcharge of at least 35%.

12. Traditional lexicographers, some of them still wedded to their manual typewriters, have been kicking against the goad for over a decade. The reactions to a paper ("Language Analysis and Lexicography by Microcomputer") that I presented at the annual meeting of the American Dialect Society in 1981 varied as follows: The dean of American lexicographers, Clarence L. Barnhart, when I visited him on that occasion, showed me his stack of Univac sorting cards from the fifties as if they were the state of the art in citation gathering; The Barnhart Dictionary Companion, started in 1985, is now a good example of a combination of the old and the new methods of data collection. Robert W. Burchfield, Editor of the O.E.D. Supplements, among other things, wrote, "Come off it!" (a feeling echoed by Sidney Landau in his landmark book on lexicography), although the very next year he had begun subcribing to Lexis, Nexis, and Dialog. Bob's challenging "hope that from your millions of words on line or standing in reserve demonstrably better dictionaries will emerge" has now been realized, at least by the New O.E.D., it being physically impossible to do better than the O.E.D. John Sinclair of the University of Birmingham, on the other hand, was more forward-looking in 1981. He wrote: "Your technology must be way ahead of us.... I am really intrigued, and if your paper (please send) fleshes out your bare statements, I shall arrive at your doorstep pretty soon." I have regularly sent John almost every lexicographical idea that has crossed my mind and I daresay he has made good use of them.

13. Two years before The Barnhart Dictionary Companion came out, I had sent out a proposal for something similar using databases such as Info Globe, Nexis, and Dialog. I think it is high time the BDC faced some competition. I would suggest a quarterly with the generic title "Dictionary Companion to Current English." I also believe there is room in some linguistic journal for a column headed "Among the Older Words."

[2 June 1992]

 

Go To Top Of Page

© 2002, Thomas M. Paikeday