frank devilbiss

Modeling Enthusiast & Data Scientist

Brewer's Dictionary of Phrase and Fable

Github Project

Introduction

On Monday night, we were driving to the library and a 30-year-old recording of Casey Kasem’s American Top 40 was playing on the 80s station. To preface the next song, Stevie Wonder’s Skeletons, Casey read an excerpt from Brewer’s Dictionary about skeletons:

The family skeleton, or the skeleton in the cupboard.Some domestic secret that the whole family conspires to keep to itself; every family is said to have at least one.The story is that someone without a single care or trouble in the world had to be found.After long and unsuccessful search a lady was discovered whom all thought would “fill the bill”; but to the great surprise of the inquirers, after she had satisfied them on all points and the quest seemed to be achieved, she took them upstairs and there opened a closet which con-tained a human skeleton.“I try” said she, “to keep my trouble to myself, but every night my husband compels me to kiss that skeleton.“She then explained that the skeleton was once her husband’s rival, killed in a duel.

After hearing that, I felt like I needed more useless but interesting trivia in my life so I went searching for the Brewer’s Dictionary of Phrase and Fable online. I ended up finding a public domain copy of the dictionary on archive.com.

Problem

After leafing through the dictionary, I wanted to do more indepth analysis of the contents of the dictionary. It might be cool to just programmatically find a random definition to feed my brain as well but there was a catch. The text is messy. Here is an excerpt of the cleanest version of the text I could get:

Abundant Number, An. A number the sum 
of whose aliquot parts is greater than itself. 
Thus 12 is an abundant number, because its 
divisors, 1, 2, 3, 4, 6=16, which is greater 
than 12. Cp. DEFICIENT NUMBER, PERFECT 
NUMBER. 



Abus 



Accius Naevius 



Abus (ab'us). An old name of the river 
H umber. See Spenser's Faerie Queene, II, 
x, 16: 

He [Locrine] then encountred, a confused rout, 
Forbye the River that whylome was hight 
The ancient Abus . . . 

See Geoffrey of Monmouth's Chronicles, 
Bk. ii, 2. 
Abyla. See CALPE. 

As you can see, the demarcation of distinct entries is somewhat confusing, there are random words for each page header interrupting the definitions and the definitions themselves seem to lack consistent structure.

brewersdict

To clean up this awesome reference for future projects, I decided to develop a script that processes the text into a neater, comma-delimited format with the columns Entry, Definition.

Python and NLTK tokenizers are leveraged for this task. Breaking the text down into sentences and words made it easier to group the text together using some general heuristics. Some of the definition text appears as entries because of idiosyncracies with the document. For example, some of the definitions have quotes that look very similar to definitions.

Check out the project on Github, if you are interested in looking at more entries or making use of the tokenized definitions.

Check this out!

My favorite definition so far is:

Reduplicated or Ricochet Words.

There are probably some hundreds of these words, which usually have an intensifying force, in use in English.The following, from ancient and modern sources, will give some idea of their variety : chit-chat, click-clack, clitter-clatter, dilly-dally, ding-dong, drip-drop, fai-lal, flim-flam, fiddle-faddle, flip-flap, flip-flop, hanky-panky, harum-scarum, helter-skelter, heyve- keyve, higgledy - piggledy, hob -nob, hodge-podge, hoity-toity, hubble-bubble, hugger-mugger, hurly-burly, mingle-mangle, mish-mash, mixy-maxy, namby-pamby, niddy- noddy, niminy-piminy, nosy-posy, pell-mell, ping-pong, pit-pat, pitter-patter, pribbles and prabbles, randem-tandem, randy-dandy, razzle-dazzle, riff-raff, roly-poly, shilly-shally, slip-slop, slish-slosh, tick-tack, tip-top, tittle- tattle, wibble-wobble, wig-wag, wiggle-waggle, wish-wash, wishy-washy.

What does heyve-keyve mean?