Content

What is a word? Myths and assumptions

Dictionaries expose us to the rich variety of concepts we can express with the English language. But when writing dictionaries, we are often challenged by the variety of ways English can be written. We don't just have to deal with the twenty-six letters of the alphabet.

Sometimes people rely on myths about the way English words are written: these false assumptions can cause problems, especially when computers are told to make these assumptions.

English words do not use accents

English has more than a hundred words with accents and diacritics in relatively common use. One of the most frequent, café is a word with an acute accent which many learners will come across in their first year of learning English. Most of these words come from other languages: from French we have café as well as many others: crèche (which has a grave accent), crêpe (with a circumflex), façade (the cedilla tells us that the c is pronounced like an s), and naïve (the diaeresis tells us the i must be pronounced separately from the a). Spanish gives us jalapeño (the tilde indicates the letter is a sort of ny sound); doppelgänger is a German loan-word with an umlaut indicating the vowel is to be pronounced more like an 'e' than an 'a'.

There are even a few occasions where diacritics have been added after a word has entered the language - coöperate uses a diaeresis to indicate the two o vowels are separate syllables; nowadays, however, a hyphen to separate the vowels is more common (co-operate).

Accents do not matter in English

Many words written with accents can be also be spelt without - naïve is probably more usually spelt naive. Spelling façade without the cedilla is usually considered incorrect, but at least there is no word to confuse it with. However, there are some occasions where the accent does distinguish between two words: perhaps the most frequent is résumé, a summary of work you have done that you show to a potential employer; written without accents it is indistinguishable from resume, a verb meaning to start doing something that you did before but stopped. The meat spread pâté should also not be written pate, as that refers to the top of someone's head.

English words contain only the letters A-Z and accents

Numbers are increasingly becoming an integral part of the English language. 3G is almost never written or pronounced "third generation", at least when it refers to mobile connections. Many chemical formulas with subscript numerals are now in common use - H2O for water and CO2 for carbon dioxide, sometimes in spoken English too. Measurements also use superscripts: cm2 stands for "centimeters squared" or "square centimeters".

English words contain only letters A-Z, accents and numbers

The English apostrophe is used to show that a letter has been left out, e.g. in I'm or can't, or with s to show possession, e.g. Anna's. It can also occur at the beginning of words, as in 'twas, or at the end, as in the builders' tools (= the tools of the builders). Some words like 'phone can be written with an apostrophe (indicating they are short for telephone, though this use is increasingly rare). When telling the time, o'clock is used to refer to a time which is exactly on the hour. It's short for of the clock though the long form is no longer used.

In American English in particular, dots (or periods) are used after letters to indicate when they are part of an acronym, whereas in British English it is more usual not to: the U.N. (American) vs the UN (British). Where the remaining letter is lower case, full stops are more normal (c., short for circa, meaning about). Units of measurement, on the other hand, usually do without: 10 mm, 7 lb.

Slashes are used in a similar way in a few specific cases - c/o meaning care of (used in addresses when the person you are writing to is staying at someone else's home). w/ for with and w/o for without are sometimes seen. Slashes can also be used in units of measurement, e.g. km/h, a written abbreviation for kilometres per hour.

It is rare to find a word which is not abbreviated or inflected in any way which nevertheless has punctuation. The programming language C++ is one.

In addition to words you might find in a dictionary, if you're analysing real English text, you'll also come across dates, times (08:30), currency (£2.50p), large numbers (1,337), fractions (1/4 or 0.25 or 25%) and other things (a 2:1; a 4×4; email and web addresses; smileys) with characters that behave more like part of a word than like punctuation.

English words are always single words

Many words in English don't occur on their own at all, or sometimes occur in phrases where the constituent parts don't keep their original meaning. Language experts sometimes disagree on where words begin or end, especially as some words can be written as two words or one.

The compound noun is made up of two or more words which together are taken as a single noun. Some are fairly easy to decode: rock salt is salt from a rock; lemon grass is a grass that smells like lemon (though it is not related to the lemon fruit). But many are much less intuitive: a red admiral is not a naval officer, but a butterfly. A grease monkey is someone who repairs engines, and a dead ringer is not a deceased campanologist but someone who looks a lot like someone else.

Some words, especially of foreign origin, only appear in compounds: cappella only exists in English in a cappella (a type of singing). The a in this case is from Italian and unrelated to English a, an, and is an inseparable part of the compound.

Hyphenated compounds are usually easier to spot (if the writer has remembered the hyphen!). left-handed is an adjective describing something done with the left hand, someone who uses their left hand more than their right, or things designed for use with the left hand. The verb to hot-desk is to share one or more desks with others, usually where there are fewer desks than people.

Phrasal verbs are another sort of multi-word unit: broken machines and naughty children are said to play up. Often the object of the phrasal verb divides the verb from the preposition, for instance to take something down, meaning to make a written note of something.

All sorts of other phrases are defined in the dictionary. In our dictionaries we refer to many of these constructions as idioms, for example: birds of a feather (people who are similar in how they think or act).

Combinations

If that weren't complicated enough, we must also remember that the tricky problems above may be found together in a single word!

For instance, many accented words are found in compounds of some sort: A tête-à-tête is a private meeting with two people, often informally. Regular or formal meetings of this sort, especially in the workplace, are known as one-to-ones, sometimes written as 1-2-1s.

A black-eyed pea is a compound noun in which one part is a hyphenated compound.

A will-o'-the-wisp is a hyphenated compound with one word ending in an apostrophe. It's a sort of ghostly spirit. A maître d' is a compound which ends with an apostrophe, and refers to a manager of a restaurant or its waiters. In English, maître is found only in this context, never as a separate word.

Why are these important?
  • When writing a dictionary, we need to think about where to put these so that the user can find them
  • When using a dictionary, we need to think about how best to look the word up
  • Increasingly the use of online tools mean we must think about how those tools work out where word boundaries are.

Contact Us

To request permission to use or license Cambridge dictionary data, please complete our query form.